This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
docs/
-
GettingInvolved.rst
2/3
LangRef.rst
-
Proposals/
1/2
VectorPredication.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
InstructionSimplify.h
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
Bitcode/
-
LLVMBitCodes.h
-
CodeGen/
-
ExpandVectorPredication.h
3/3
ISDOpcodes.h
-
Passes.h
-
SelectionDAG.h
5/5
SelectionDAGNodes.h
-
IR/
-
Attributes.td
-
IRBuilder.h
-
IntrinsicInst.h
-
Intrinsics.td
-
MatcherCast.h
-
PatternMatch.h
-
PredicatedInst.h
-
VPBuilder.h
-
InitializePasses.h
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
Analysis/
-
InstructionSimplify.cpp
-
TargetTransformInfo.cpp
-
AsmParser/
-
LLLexer.cpp
-
LLParser.cpp
-
LLToken.h
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/
-
CMakeLists.txt
-
ExpandVectorPredication.cpp
-
SelectionDAG/
-
DAGCombiner.cpp
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
SelectionDAG.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
SelectionDAGISel.cpp
-
TargetPassConfig.cpp
-
IR/
-
Attributes.cpp
-
CMakeLists.txt
-
IRBuilder.cpp
-
IntrinsicInst.cpp
-
PredicatedInst.cpp
-
VPBuilder.cpp
-
Verifier.cpp
-
Transforms/
-
InstCombine/
-
InstCombineAddSub.cpp
-
InstCombineCalls.cpp
-
InstCombineInternal.h
-
Utils/
-
CodeExtractor.cpp
-
test/
-
Bitcode/
-
attributes.ll
-
CodeGen/
-
AArch64/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
ARM/
-
O3-pipeline.ll
-
Generic/
-
expand-vp.ll
-
X86/
-
O0-pipeline.ll
-
O3-pipeline.ll
-
Transforms/
-
InstCombine/
-
vp-fsub.ll
-
InstSimplify/
-
vp-fsub.ll
-
Verifier/
-
evl_attribs.ll
-
vp-intrinsics-constrained.ll
-
vp-intrinsics.ll
-
tools/
-
llc/
-
llc.cpp
-
opt/
-
opt.cpp
-
unittests/IR/
-
IR/
-
CMakeLists.txt
-
IRBuilderTest.cpp
-
VPIntrinsicTest.cpp
-
utils/TableGen/
-
TableGen/
-
CodeGenIntrinsics.h
-
CodeGenTarget.cpp
-
IntrinsicEmitter.cpp

Differential D57504

RFC: Prototype & Roadmap for vector predication in LLVM
Changes PlannedPublic

Authored by simoll on Jan 31 2019, 3:12 AM.

Download Raw Diff

Details

Reviewers

mkuper
fhahn
rengolin
huntergr
sdesmalen
m_zuckerman
jdoerfert

Summary

Vector Predication Roadmap

This proposal defines a roadmap towards native vector predication in LLVM, specifically for vector instructions with a mask and/or an explicit vector length.
LLVM currently has no target-independent means to model predicated vector instructions for modern SIMD ISAs such as AVX512, ARM SVE, the RISC-V V extension and NEC SX-Aurora.
Only some predicated vector operations, such as masked loads and stores are available through intrinsics [MaskedIR]_.

Please use docs/Proposals/VectorPredication.rst to comment on the summary.

Vector Predication intrinsics

The prototype in this patch demonstrates the following concepts:

Predicated vector intrinsics with an explicit mask and vector length parameter on IR level.
First-class predicated SDNodes on ISel level. Mask and vector length are value operands.
An incremental strategy to generalize PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both regular instructions and VP intrinsics.
DAGCombiner example: FMA fusion.
InstCombine/InstSimplify example: FSub pattern re-writes.
Early experiments on the LNT test suite (Clang static release, O3 -ffast-math) indicate that compile time on non-VP IR is not affected by the API abstractions in PatternMatch, etc.

Roadmap

Drawing from the prototype, we propose the following roadmap towards native vector predication in LLVM:

1. IR-level VP intrinsics

There is a consensus on the semantics/instruction set of VP intrinsics.
VP intrinsics and attributes are available on IR level.
TTI has capability flags for VP (`supportsVP()?, haveActiveVectorLength()`?).

Result: VP usable for IR-level vectorizers (LV, VPlan, RegionVectorizer), potential integration in Clang with builtins.

2. CodeGen support

VP intrinsics translate to first-class SDNodes (`llvm.vp.fdiv.* -> vp_fdiv`).
VP legalization (legalize explicit vector length to mask (AVX512), legalize VP SDNodes to pre-existing ones (SSE, NEON)).

Result: Backend development based on VP SDNodes.

3. Lift InstSimplify/InstCombine/DAGCombiner to VP

Introduce PredicatedInstruction, PredicatedBinaryOperator, .. helper classes that match standard vector IR and VP intrinsics.
Add a matcher context to PatternMatch and context-aware IR Builder APIs.
Incrementally lift DAGCombiner to work on VP SDNodes as well as on regular vector instructions.
Incrementally lift InstCombine/InstSimplify to operate on VP as well as regular IR instructions.

Result: Optimization of VP intrinsics on par with standard vector instructions.

4. Deprecate llvm.masked.* / llvm.experimental.reduce.*

Modernize llvm.masked.* / llvm.experimental.reduce* by translating to VP.
DCE transitional APIs.

Result: VP has superseded earlier vector intrinsics.

5. Predicated IR Instructions

Vector instructions have an optional mask and vector length parameter. These lower to VP SDNodes (from Stage 2).
Phase out VP intrinsics, only keeping those that are not equivalent to vectorized scalar instructions (reduce, shuffles, ..).
InstCombine/InstSimplify expect predication in regular Instructions (Stage (3) has laid the groundwork).

Result: Native vector predication in IR.

References

.. [MaskedIR] llvm.masked.* intrinsics, https://llvm.org/docs/LangRef.html#masked-vector-load-and-store-intrinsics
.. [EvlRFC] Explicit Vector Length RFC, https://reviews.llvm.org/D53613

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 40575
Build 40691: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rengolin added subscribers: Ayal, hsaito.Feb 14 2019, 2:57 AM

cameron.mcinally added a subscriber: cameron.mcinally.Feb 25 2019, 10:08 AM

samparker added a subscriber: samparker.Feb 27 2019, 1:24 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.Mar 7 2019, 11:41 AM

chill added a subscriber: chill.Mar 16 2019, 2:01 AM

re-based onto master

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptMar 19 2019, 12:10 AM

Harbormaster completed remote builds in B29336: Diff 191252.Mar 19 2019, 12:12 AM

mcberg2017 added a subscriber: mcberg2017.Mar 19 2019, 4:30 PM

alexsusu added a subscriber: alexsusu.Mar 23 2019, 12:39 PM

dmgreen added a subscriber: dmgreen.Mar 28 2019, 10:00 AM

vchuravy added a subscriber: vchuravy.Apr 5 2019, 5:44 AM

Updates

added constrained fp intrinsics (IR level only).
initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Cross references

llvm.experimental.reduce.* (https://reviews.llvm.org/D60261 and/or https://reviews.llvm.org/D60262) - VP reduction signatures should track what comes out of that RFC.
SVE type support (https://reviews.llvm.org/D32530) - VPBuilder has to be made compatible with SVE types (it uses a static vector length atm).

Harbormaster completed remote builds in B30614: Diff 195366.Apr 16 2019, 6:39 AM

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.
Do we have enough upside in having both?

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

In D57504#1469847, @simoll wrote:

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

There is an indirect harm in adding more intrinsics with partially-redundant semantics: writing transformations and analyses requires logic that handles both forms. I recommend having fewer intrinsics where we can have fewer intrinsics.

In D57504#1469847, @simoll wrote:

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Then, please propose one more rounding mode, like round.permissive or round.any.

In D57504#1470254, @hfinkel wrote:

In D57504#1469847, @simoll wrote:

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

There is an indirect harm in adding more intrinsics with partially-redundant semantics: writing transformations and analyses requires logic that handles both forms. I recommend having fewer intrinsics where we can have fewer intrinsics.

Yep. If one additional generally-useful rounding mode gets rid of several partially redundant intrinsics, that would be a good trade-off.

In D57504#1469847, @simoll wrote:

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

If you use "round.tonearest" that will get you the same semantics as the non-constrained version. The optimizer assumes round-to-nearest by default.

Would it make sense to also update docs/AddingConstrainedIntrinsics.rst please?

Thanks for your feedback!

Planned

Make the llvm.vp.constrained.* versions the only fp ops in vp. Encode default fp semantics by passing fpexcept.ignore and round.tonearest.
Update docs/AddingConstrainedIntrinsics.rst to account for the fact that llvm.experimental.constrained.* is no longer the only namespace for constrained intrinsics.

In D57504#1470705, @kpn wrote:

Would it make sense to also update docs/AddingConstrainedIntrinsics.rst please?

Sure. I don't think we should match (in the API) an llvm.vp.constrained.* intrinsic as ConstrainedFPIntrinsic though.
Conceptually, an llvm.vp.constrained.* intrinsics sure is both - VPIntrinsic and ConstrainedFPIntrinsic. If the latter is used to transform them, ignoring the mask an vector len argument along the way, we'll see breakage (..in the future, once there are transforms for constrained fp).

pengfei added a subscriber: pengfei.Apr 22 2019, 6:17 PM

vkmr added a subscriber: vkmr.May 7 2019, 7:55 AM

rengolin mentioned this in D53613: RFC: Explicit Vector Length Intrinsics and Attributes.Jul 2 2019, 1:59 AM

This is a "Keepalive" message - I will get back working on LLVM-VP in October.

Herald added a reviewer: rengolin. · View Herald TranscriptAug 7 2019, 2:09 AM

Herald added subscribers: s.egerton, simoncook. · View Herald Transcript

Nice. Btw. another motivation could be std::simd. Here the overflow intrinsics exposed as builtin would allow us to provide a fast implementation of the masked variants of <simd>

sepavloff added a subscriber: sepavloff.Aug 21 2019, 9:31 AM

Picking this up again. I begin with changing the VP intrinsics as outlined before with one deviation from the earlier plan:

There will be no llvm.vp.constrained.* just llvm.vp.* and all FP intrinsics will have an exception mode and rounding mode parameter.

Herald added subscribers: lenary, hiraditya. · View Herald TranscriptOct 7 2019, 4:45 AM

This work was mentioned on the SVE discussion about predication, adding arm folks, just in case.

<<~same mail send to llvm-dev>>

Who is interested in a round table on vector predication at the '19 US DevMtg and/or would like to help organizing one? There were some proposals for related round tables on the mailing list but not all of them have a time slot yet (VPlan, SVE, complex math, constrained fp, ..). I am eyeing the Wednesday, 11:55 slot so please let me know if there is a schedule conflict i am not aware of.

Potential Topics:

Intersection with constrained-fp intrinsics and backend support (also complex arith).

Design of predicated reduction intrinsics (intersection with llvm.experimental.reduce[.v2].*).

Compatibility with SVE LLVM extension.

<Your topic here>

a.elovikov added a subscriber: a.elovikov.Oct 17 2019, 11:36 AM

rscottmanley added a subscriber: rscottmanley.Oct 18 2019, 8:07 AM

Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {

v = std.vector.select(5 / v, v, notzero);

}

In D57504#1720792, @shawnl wrote:
Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {
v = std.vector.select(5 / v, v, notzero);
}

What you describe is a workaround but not a solution for predicated SIMD in LLVM.
This approach may seem natural considering SIMD ISAs, such as x86 SSE, ARM NEON, that do not have predication.
It is however a bad fit for SIMD instruction sets that do support predicated SIMD (AVX512, ARM SVE, RISC-V V, NEC SX-Aurora).

As it turns out, it is more robust to have predicated instructions right in LLVM IR and convert them to the instruction+select pattern for SSE and friends than going the other way round.
This is what LLVM-VP proposes.

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.
@nhaehnle brought up that the LLVM-VP intrinsics should be convenient and natural to work with. The convenience wrappers (PredicatedInstruction, PredicatedBinaryOperator) and pattern rewrite generalizations already achieve this to a large extent. Specifically, there should be no "holes" when it comes to handling the intrinsics (eg it should not be necessary to resort to lower-level APIs (VPIntrinsic) when dealing with predicated SIMD). (To take something actionable from this, i think there should be an IRBuilder<>::CreateVectorFAdd(A, B, Mask, AVL, InsertPt), returning a PredicatedBinaryOperator, which may either be an FAdd instruction (or constrained fp..) or a llvm.vp.fadd intrinsic, depending on the fp environment, mask parameter, and vector length parameter.)

In D57504#1723466, @simoll wrote:
In D57504#1720792, @shawnl wrote:
Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {
v = std.vector.select(5 / v, v, notzero);
}
What you describe is a workaround but not a solution for predicated SIMD in LLVM.
This approach may seem natural considering SIMD ISAs, such as x86 SSE, ARM NEON, that do not have predication.
It is however a bad fit for SIMD instruction sets that do support predicated SIMD (AVX512, ARM SVE, RISC-V V, NEC SX-Aurora).

As it turns out, it is more robust to have predicated instructions right in LLVM IR and convert them to the instruction+select pattern for SSE and friends than going the other way round.
This is what LLVM-VP proposes.

+1 on what Simon said.

There are lots of peeps like:

select ?, X, undef -> X

If we optimize away the select, we could end up incorrectly trapping on the no longer masked bits of X. This would be bad for the constrained intrinsics.

But also in the general case, it's very hard to keep a select glued to an operation through opt and llc.

In D57504#1724196, @cameron.mcinally wrote:

+1 on what Simon said.

+1.

In D57504#1723586, @simoll wrote:

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.

I'd like to rant a little bit to see if anyone agrees with my probably unpopular opinion...

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I do understand the desire for these intrinsics by some -- i.e. devs that don't care about these new features aren't impacted. I think that's short sighted though. Hiding complexity behind utility functions will be painful when debugging tough problems. And updating existing code to use pattern matchers is a lot of churn -- probably more churn than making the constructs first-class citizens.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

In D57504#1724217, @cameron.mcinally wrote:

In D57504#1723586, @simoll wrote:

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.

I'd like to rant a little bit to see if anyone agrees with my probably unpopular opinion...

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

Actually, the idea behind the generalized pattern code is to offer a way to gradually transition from intrinsics to native instruction support without disturbing transformations. The pattern matcher is templatized to match the intrinsics first (through utility classes). When the transition to native IR support is complete, one template-instantiation of the pattern rewriter gets dropped and the code bloat is undone. Eg in the case of VP, eventually PatternMatch will only ever be instantiated for the PredicatedContext and no longer for the special case of the EmptyContext. However, initially (in this patch) the pattern matcher is still instantiated for both kinds of context.
We can use the same mechanism to lift existing optimizations to complex arithmetic intrinsics. In that case, the matcher context would require that all constituent operations are complex number operators. The builder consuming the context will emit complex operations.

I do understand the desire for these intrinsics by some -- i.e. devs that don't care about these new features aren't impacted. I think that's short sighted though. Hiding complexity behind utility functions will be painful when debugging tough problems. And updating existing code to use pattern matchers is a lot of churn -- probably more churn than making the constructs first-class citizens.

Sure. You know its tempting to just duplicate all OpCodes (Opcodes v2) and redesign them (all of them..) to support all of this from the start: a) masking (also for scalar ops), b) an active vector length, c) constrained fp.
If you want the existing transformations to work with OpCodes v2 , you'd need exactly the same pattern generalizations, btw. In the end, whether its native instructions or intrinsics does not matter that much.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

I'd say LLVM is long past the starting line. If we just turn on predication on regular IR instructions, many existing instruction transformations will break. You'd need one monster commit that does the switch and fixes all these transformations at the same time.

In D57504#1724217, @cameron.mcinally wrote:

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I think this is a big hammer argument for a nuanced topic.

We have used experimental intrinsics for a large number of disparate concepts, from exception handling to fuzzy vector extensions, and then after the semantics was defined and accepted, we baked the concepts into IR.

This is a proven track, and predication is a very similar example to past experiences, I see no contradiction here.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

The risk of getting it wrong and having to re-bake into IR is high. We've done that with exception handling before and it wasn't pretty.

Predication is already in native IR form, albeit complex and error prone. The nuances across targets are too many to have a simple implementation working for everyone, and having a concrete implementation of the idea in intrinsic form may help clear up the issues before we stick to anything.

It's quite possible, and I really hope, that only a few targets will actually implement them, and that will be enough, so intrinsics will be short lived. Meanwhile, previous IR patterns will still match, so nothing is lost.

Of course, as with any intrinsic, it's quite possible that it will "just work" and people will give up half-way through. But history has shown that more often than not, these group efforts finish with a reasonable implementation, better than what we had before.

cheers,
--renato

+1 to what Renato said, I like this direction!
FWIW: we are working on Arm's M-profile Vector Extension (MVE), another vector extension for which this is very useful.

In D57504#1724902, @rengolin wrote:

In D57504#1724217, @cameron.mcinally wrote:

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I think this is a big hammer argument for a nuanced topic.

That's fair. But we can talk specifics too. We already have a lot of this functional and optimized in Clang. E.g.:

#include <stdio.h>

#pragma STDC FENV_ACCESS ON

void foo(double a[], double b[]) {
  double res[8];
  for(int i = 0; i < 8; i++)
    if (b[i] != 0.0)
      res[i] = a[i] / b[i];

  printf("%f\n", res[0]);
}

vmovupd (%rsi), %zmm0           #  test.c:8:9
vxorpd  %xmm1, %xmm1, %xmm1     #  test.c:8:14
vcmpneqpd       %zmm1, %zmm0, %k1 #  test.c:8:14
vmovupd (%rdi), %zmm1 {%k1} {z} #  test.c:9:16
vdivpd  %zmm0, %zmm1, %zmm0 {%k1} #  test.c:9:21
vmovupd %zmm0, (%rsp) {%k1}     #  test.c:9:14
vmovsd  (%rsp), %xmm0           #  test.c:11:18

That said, there's a large amount of technical debt from carrying these changes locally. I'd like to get out of that debt. That's why I'd like to avoid the experimental intrinsics detour.

I will also note that we care about a limited number of targets. So to be fair, take that into consideration.

We have used experimental intrinsics for a large number of disparate concepts, from exception handling to fuzzy vector extensions, and then after the semantics was defined and accepted, we baked the concepts into IR.

This is a proven track, and predication is a very similar example to past experiences, I see no contradiction here.

That's a fair argument too. I wasn't monitoring those projects, so I don't know the specifics.

Predication, Complex, and FPEnv require a massive amount of intrinsics to work though. Pretty much duplicating every operator (and target specific intrinsic for FPEnv). And probably some others I've forgotten. That seems like an unreasonable amount of intrinsics to me. But if others with experience in experimental intrinsics think it's manageable, I can't really argue.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

The risk of getting it wrong and having to re-bake into IR is high. We've done that with exception handling before and it wasn't pretty.

Predication is already in native IR form, albeit complex and error prone. The nuances across targets are too many to have a simple implementation working for everyone, and having a concrete implementation of the idea in intrinsic form may help clear up the issues before we stick to anything.

It's quite possible, and I really hope, that only a few targets will actually implement them, and that will be enough, so intrinsics will be short lived. Meanwhile, previous IR patterns will still match, so nothing is lost.

Of course, as with any intrinsic, it's quite possible that it will "just work" and people will give up half-way through. But history has shown that more often than not, these group efforts finish with a reasonable implementation, better than what we had before.

Another good argument. And this isn't really the hill I want to die on. But it just seems silly to me to implement something twice: Occam's razor. We'll have to work the kinks out somewhere -- so why not push directly to the goal...

In D57504#1725429, @cameron.mcinally wrote:

But it just seems silly to me to implement something twice: Occam's razor. We'll have to work the kinks out somewhere -- so why not push directly to the goal...

I see where you're coming from, but hindsight is 20/20. Implementing something twice, when the first one is a prototype means you can make a lot of mistakes on the first iteration.

If the cost of changing the IR outweighs the prototyping costs (it usually does), than the overall cost is lower, even if for a longer period.

The current proposals are interlinked, so I don't think there will be combinatorial explosion, or even multiplication of intrinsics. I hope that we'll figure out the best way to represent that into IR sooner because of that.

This is not the first time that we try to get those into IR proper, either. All previous times we started with "change the IR" approach and could never get into agreement.

Intrinsics give us the prototype route: low implementation cost, low impact, easy to clean up later. It does add clutter in between, but that impact can also be limited to one or two targets of the willing sub-communities.

LLVM is a very fast moving target, stopping the world to get the IR "right" doesn't work.

A good example to look for is the scalable vector IR changes that have gone through multiple attempts and are going on for many years and still not complete...

These things take time, rushing it usually backfires. :)

In D57504#1725445, @rengolin wrote:

LLVM is a very fast moving target, stopping the world to get the IR "right" doesn't work.

A good example to look for is the scalable vector IR changes that have gone through multiple attempts and are going on for many years and still not complete...

These things take time, rushing it usually backfires. :)

Ha, yeah. All good points. I'll let this drop...

Updates

Fixed several intrinsic attributes.
All fp intrinsics are constrained (identically to the llvm.contrained.* ones). They behave like regular fp ops if fpexcept.ignore is passed.
Bitcode verifier test.

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.
The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.
When the exception behavior is not fpxcept.ignore, the fp op of the intrinsic is not revealed (getFunctionalOpcode(..) returns Call in that case).
(FIXME) NoCapture does not work on vectors of pointers.

Next steps

As mentioned earlier, generalized pattern matching will be part of a separate RFC (although its still included in this reference implementation).
I'd like to discuss the actual intrinsic signatures next. For that i will upload a new minimal patch for integer intrinsic support.

Harbormaster completed remote builds in B40198: Diff 226913.Oct 29 2019, 9:39 AM

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

llvm/include/llvm/CodeGen/ISDOpcodes.h
489	I was unfamiliar with this one... I think I know what it does, and how it is different from VP_SELECT, but for clarity, can you define what `integer pivot` is?
492	typo: hether
1134	just spell out 'otherwise' here, and also below.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h
727	Perhaps outdated comment? Should it be something along the lines of 'vector predicated node' i.s.o. explicit vector lenght node?
1492	indentation of `\|\|` off by 1?
2357	`VP_LOAD` and `VP_STORE`?
2386	same?
2426	`.. does a truncation before store` sounds a bit odd. Since 'truncating store' is a well known term, and that you explain what it is for ints/floats below, I think it suffices to say "Return true if this is truncating store. For intergers ..."

In D57504#1725554, @simoll wrote:

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.

Wouldn't that allow the call to be moved relative to other calls? Specifically, we need to make sure intrinsics aren't moved relative to calls that change the rounding mode. The "inaccessiblememonly" attribute is meant to model both the reading of control modes and the possible setting of status flags or raising of exceptions"

The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.

I don't think this is right. Even if there are no constrained FP operations in the function we might have math library calls for which the strictfp attribute is needed to prevent libcall simplification and constant folding that might violate the rounding mode.

In D57504#1726355, @andrew.w.kaylor wrote:

In D57504#1725554, @simoll wrote:

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.

Wouldn't that allow the call to be moved relative to other calls? Specifically, we need to make sure intrinsics aren't moved relative to calls that change the rounding mode. The "inaccessiblememonly" attribute is meant to model both the reading of control modes and the possible setting of status flags or raising of exceptions"

The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.

I don't think this is right. Even if there are no constrained FP operations in the function we might have math library calls for which the strictfp attribute is needed to prevent libcall simplification and constant folding that might violate the rounding mode.

I see. Since we need to model the default fp environment with these intrinsics (and this is our priority), let me make the following suggestion: VP intrinsics will have a rounding mode and exception behavior argument from the start but the only allowed values are "round.tonearest" and "fpexcept.ignore". Once we have a solution for the general case implemented for constraint fp, we will unlock that feature also for LLVM VP.

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

In D57504#1726715, @simoll wrote:

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

Cheers. Just curious, what are your next steps? People can correct me if I'm wrong, but my impression is that with the RFC, this prototype, the discussion at the US LLVM dev conference, there is consensus and people are on-board with the general idea and direction. There are still some discussions on e.g. the (constrained) FP part, but would it now be the time split this for example up in an separate commits, like an INT and FP part (if that makes sense), so that they can be (separately) progressed?

In D57504#1729839, @SjoerdMeijer wrote:

In D57504#1726715, @simoll wrote:

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

Cheers. Just curious, what are your next steps? People can correct me if I'm wrong, but my impression is that with the RFC, this prototype, the discussion at the US LLVM dev conference, there is consensus and people are on-board with the general idea and direction. There are still some discussions on e.g. the (constrained) FP part, but would it now be the time split this for example up in an separate commits, like an INT and FP part (if that makes sense), so that they can be (separately) progressed?

Yes, i hope that's where things are right now ;-) I am planning to go by functional slices. Each slice comes with IR-level intrinsics, TTI support, basic lowering to standard IR, Selection DAG support and tests.

I am preparing the first patchset for integer support atm.

Slices:

Integer slice.
Memory slice.
Reduction slice.
FP (with unconstrained metadata args) slice.

Standalone patch:

Mask, VectorLength and Passthru attributes (in preparation of vector function calls).

Pending discussion/separate RFC:

Constrained FP (being able to fully optimize constrained fp intrinsics in the default fp env).
Generalized pattern match (aka optimizing VP).

Nice one!

k-ishizaka added a subscriber: k-ishizaka.Nov 4 2019, 9:46 PM

D69552: Move floating point related entities to namespace level contains the fp enum changes required for LLVM-VP. Referencing the patch here.

Fixed attribute placements, signatures, more tests, ..
This is in sync with the subpatch #1 of the integer slice (https://reviews.llvm.org/D69891).

Harbormaster completed remote builds in B40575: Diff 228052.Nov 6 2019, 6:27 AM

Integer slice patches

#1 IR-level support: https://reviews.llvm.org/D69891
#2 TTI & Legalization: <stay tuned>
#3 ISel patch: <stay tuned>

I'll update this comment as we go to keep track of the integer slice.

simoll added a child revision: D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.Nov 7 2019, 12:39 AM

Changes

VPIntrinsics.def file.
Pass vlen i32 -1 to enable all lanes with scalable vector types.
Various NFC fixes.

Harbormaster completed remote builds in B40813: Diff 228885.Nov 12 2019, 6:53 AM

Moving the discussion from the integer patch alley to the main RFC as this is about the general design of VP intrinsics.. it's about having a passthru operand (as in llvm.masked.load) and whether %evl should be a parameter of the intrinsics or modelled differently.

@SjoerdMeijer https://reviews.llvm.org/D69891#inline-636845
and if I'm not mistaken we are now discussing if undef here should be undef or a passthru

@rkruppe https://reviews.llvm.org/D69891#inline-637215
I previously felt that passthru would be nice to have for backend maintainers (including myself) but perhaps not worth the duplication of IR functionality (having two ways to do selects). However, given the differences I just described, I don't think "just use select" is workable.

Ok. I do agree that having passthru simplifies isel for certain architectures (and legal combinations of passthru value, type, and operations..) but:
VP intrinsics aren't target intrinsics: they are not supposed to be a way to directly program any specific ISA in the way of a macroassembler, like you would do with llvm.x86.* or llvm.arm.mve.* or any other. Rather, think of them as regular IR instructions. Pretend that anything we propose in VP intrinsics will end up as a feature of a first-class LLVM instructions. Based on that i figured that one VP intrinsics should match one IR instructions plus predication, nothing more.

If we had predicated IR instructions, would we want them to have a passthru operand?
The prototype shows that defining VP intrinsics with undef-on-masked-out makes it straightforward to generalize InstSimplify/InstCombine/DAGCombiner such that they can optimize VP intrinsics. If you add a passthru operand then logically VP intrinsics start to behave like two instructions: that could be made work but it would be messier as you'd have to peek through selects, etc.

@sdesmalen https://reviews.llvm.org/D69891#1750287
If we want to solve the select issue and also keep the intrinsics simple, my suggestion was to combine the explicit vector length with the mask using an explicit intrinsic like @llvm.vp.enable.lanes. Because this is an explicit intrinsic, the code-generator can simply extract the %evl parameter and pass that directly to the instructions for RVV/SXA. This is what happens for many other intrinsics in LLVM already, like masked.load/masked.gather that support only a single addressing mode, where it is up to the code-generator to pick apart the value into operands that are suited for a more optimal load instruction.

Without having heard your thoughts on this suggestion, I would have to guess that your reservation is the possibility of LLVM hoisting/separating the logic that merges predicate mask and %evl value in some way. That would mean having to do some tricks (think CodeGenPrep) to keep the values together and recognizable for CodeGen. And that's the exact same thing we would like to avoid for supporting merging/zeroing predication, hence the suggestion for the explicit passthru parameter.

That's not quite the same:
%evl is mapped to a hardware register on SX-Aurora. We cannot simply reconstitute the %evl from any given mask, if %evl is obscured it makes all operations that depend on it less efficient because we need to default to the full vector length. Now, if the select is separated from the VP intrinsic, you simply emit one select instruction (and it should be possible to hoist it back and merge it with the VP intrinsic in most cases (.. and you probably want an optimization that does that in anyway because there will be code with explicit selects even with passthru)). Besides, if the select is folded with an instruction that is subsequently simpler then that's actually an argument in favor of explicit selects: passthru makes this implicit.

In D57504#1758546, @simoll wrote:

Ok. I do agree that having passthru simplifies isel for certain architectures (and legal combinations of passthru value, type, and operations..) but:
VP intrinsics aren't target intrinsics: they are not supposed to be a way to directly program any specific ISA in the way of a macroassembler, like you would do with llvm.x86.* or llvm.arm.mve.* or any other. Rather, think of them as regular IR instructions. Pretend that anything we propose in VP intrinsics will end up as a feature of a first-class LLVM instructions. Based on that i figured that one VP intrinsics should match one IR instructions plus predication, nothing more.

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better. I wasn't specifically advocating for passthru operands, though. I agree that not having passthru and performing the same function in two operations can be readily pattern-matched by backends at modest effort and failing to match it has low cost. My main point was just that the existing select instruction is not sufficient as the second operation, for essentially the same reason why the VP intrinsics have an EVL argument instead of just the mask. Creating a VP equivalent of select (as already sketched in the other thread) resolves that concern just as well.

simoll marked an inline comment as done.Dec 3 2019, 4:18 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15283–15300	@rkruppe [..] My main point was just that the existing select instruction is not sufficient as the second operation, for essentially the same reason why the VP intrinsics have an EVL argument instead of just the mask. Creating a VP equivalent of select (as already sketched in the other thread) resolves that concern just as well. I agree. The prototype has defined such an `llvm.vp.select` from the get-go.

rkruppe added inline comments.Dec 3 2019, 10:06 AM

llvm/docs/LangRef.rst
15283–15300	Oops, missed that / forgot about it. Sorry for the noise. Is there a reason why it's not in the "integer slice" patch? It's not integer-specific, but it seems to fit even less into the other slices.

simoll marked an inline comment as done.Dec 9 2019, 12:52 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15283–15300	I wanted to keep the integer patch concise for one. Also, having played around with this for a while now, i think that the signature of `vp.select` should be: llvm.vp.select(<W x i1> %m, %onTrue, %onFalse, i32 %threshold, i32 vlen %evl) meaning that values from %onTrue are selected where %m is true and the lane index is below %threshold. %onFalse is selected otherwise. Lane indices greater-equal %evl are undef as ever. In short: there is just one "merge" operation and no more separate `vp.compose`.

Herald added a subscriber: luismarques. · View Herald TranscriptDec 9 2019, 12:52 AM

Matt added a subscriber: Matt.Dec 18 2019, 12:57 PM

kariddi added a subscriber: kariddi.Dec 26 2019, 11:44 AM

Add pivot/threshold argument to llvm.vp.select and remove llvm.vp.compose
Clarify documentation on preserved lanes
explicit vlen arg is either negative or (new requirement) less-equal-than the number of lanes of the operation.

(not sure if I should continue here or in D69891, will try here first)

Sorry for dipping out of this discussion. I.e. after our "passthru discussion", I wanted to do more homework to make sure a "separate select" would work for us, if we wouldn't miss anything, but then other work happened and never got round to this. But I am still very interested, so dipping back in :-/

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better.

One of the problems I had that I found it difficult to see all consequences, and answers the sort of questions asked above (also because I haven't yet spend enough time on this). For example, being explicit in IR is in general a good thing to do? So yes, why not a passthru? But then I had the same question as Robin, not sure it would be better. The other thing I would mention again is that I think convenience is a pretty strong argument too, if this is most convenient for at least two / three other architectures, then why not? But then you could argue that it is simple to patch up with a select, and we're going in circles... At least the concern Robin brought up about the select seems to be addressed with the vp.select.

In D57504#1849694, @SjoerdMeijer wrote:

(not sure if I should continue here or in D69891, will try here first)

Yep, the RFC is the right place for conceptual discussions.

Sorry for dipping out of this discussion. I.e. after our "passthru discussion", I wanted to do more homework to make sure a "separate select" would work for us, if we wouldn't miss anything, but then other work happened and never got round to this. But I am still very interested, so dipping back in :-/

Welcome back :)

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better.

One of the problems I had that I found it difficult to see all consequences, and answers the sort of questions asked above (also because I haven't yet spend enough time on this). For example, being explicit in IR is in general a good thing to do? So yes, why not a passthru? But then I had the same question as Robin, not sure it would be better. The other thing I would mention again is that I think convenience is a pretty strong argument too, if this is most convenient for at least two / three other architectures, then why not? But then you could argue that it is simple to patch up with a select, and we're going in circles... At least the concern Robin brought up about the select seems to be addressed with the vp.select.

Couldn't agree more. I guess we just do now know at this point.. how about we move the discussion away from "which would be better?" to "if we decide for A now and later strongly realize that B would have been the right call.. how bad a u-turn would that be?"

Changes required going from passthru to select:

IR: modernize VP with passthru to intrinsic+select
Nothing more.. since we already had to implement the select+intrinsic matching logic anyway to fuse explicit selects into passthru operands.
Dead code: all the logic for dealing with the passthru operand: PatternMatch for passthru (instcombine, instsimplify, known bits..), etc

Changes required going from select to passthru:

IR: modernize and pass 'undef' as passthru
Implement that pass from the other scenario that folds select into passthru (and all the additional logic for dealing with passthru).
Dead code: none

My point here is that no matter how we decide: explicit selects and vp intrinsics will co-exist and have to be folded/optimized. However, in the explicit-select scenario we do not have to teach LLVM about passthru operands (PatternMatch -> InstCombine, ...).
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.
Regarding convenience: the IRBuilder could have, eg, a ::CreatePredicatedFAdd with an explicit (optional) passthru operand.. resulting in a VP op + select.

Couldn't agree more. I guess we just do now know at this point.. how about we move the discussion away from "which would be better?" to "if we decide for A now and later strongly realize that B would have been the right call.. how bad a u-turn would that be?"

Changes required going from passthru to select:

IR: modernize VP with passthru to intrinsic+select

Nothing more.. since we already had to implement the select+intrinsic matching logic anyway to fuse explicit selects into passthru operands.

Dead code: all the logic for dealing with the passthru operand: PatternMatch for passthru (instcombine, instsimplify, known bits..), etc

Changes required going from select to passthru:

IR: modernize and pass 'undef' as passthru

Implement that pass from the other scenario that folds select into passthru (and all the additional logic for dealing with passthru).

Dead code: none

My point here is that no matter how we decide: explicit selects and vp intrinsics will co-exist and have to be folded/optimized. However, in the explicit-select scenario we do not have to teach LLVM about passthru operands (PatternMatch -> InstCombine, ...).
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.
Regarding convenience: the IRBuilder could have, eg, a ::CreatePredicatedFAdd with an explicit (optional) passthru operand.. resulting in a VP op + select.

Thanks for summarising this. Fair enough, I think this sounds like a (good) plan.
I will continue in D69891, and will leave a comment there.

Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:

if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;

The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

In D57504#1851864, @cameron.mcinally wrote:
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:
if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;
The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

As Eli argued in that patch, IR like select %m, (constrained.fadd %a, %b), %passthru is not expressing a predicated vector add, and must not be selected as such. The IR semantics are unambiguously: first a full vector add is performed (with all exceptions etc. that entails, or possible UB in related cases like integer division) and then some of the resulting lanes are replaced with values from %passthru. To predicate the fadd itself, a dedicated operation/intrinsic is needed. LLVM IR does not currently (and should not) change the meaning of the regular unpredicated operations based on (some? any?) uses of the value being a select. The only thing a select (or vp.select) can do is alter the lanes of a vector after it has been computed, it cannot travel back in time to change how it was computed.

VP intrinsics are the aforementioned predicated operations: in certain lanes, no computation (which might raising FP exceptions, have UB, etc.) happens and the resulting vector has some "default value" instead. The present discussion about whether to include a %passthru argument is just about how this default value is determined. But this does not change that the operation itself is predicated, it just affects how you express e.g. the patterns that map to SVE's zeroing and merging predication.

In D57504#1851960, @rkruppe wrote:

As Eli argued in that patch, IR like select %m, (constrained.fadd %a, %b), %passthru is not expressing a predicated vector add, and must not be selected as such. The IR semantics are unambiguously: first a full vector add is performed (with all exceptions etc. that entails, or possible UB in related cases like integer division) and then some of the resulting lanes are replaced with values from %passthru. To predicate the fadd itself, a dedicated operation/intrinsic is needed. LLVM IR does not currently (and should not) change the meaning of the regular unpredicated operations based on (some? any?) uses of the value being a select. The only thing a select (or vp.select) can do is alter the lanes of a vector after it has been computed, it cannot travel back in time to change how it was computed.

VP intrinsics are the aforementioned predicated operations: in certain lanes, no computation (which might raising FP exceptions, have UB, etc.) happens and the resulting vector has some "default value" instead. The present discussion about whether to include a %passthru argument is just about how this default value is determined. But this does not change that the operation itself is predicated, it just affects how you express e.g. the patterns that map to SVE's zeroing and merging predication.

Understood. I now see that we already discussed this here in October.

Your current argument sounds like it argues for explicit passthrus. E.g.:

select %m, (vp.fadd %m, %a, %b), %zeroinitializer

On SVE, this would become something like:

movprfx z0.s, p0/z, z0.s
fadd z0.s, p0/m, z0.s, z1.s

Isn't that traveling back in time to change how the inactive elements are defined? To be true to the IR. we'd want something like:

fadd z0.s, p0/m, z0.s, z1.s
sel z0s, p0/m, z0.s, <zero_vector>

How do we justify that this case is different than the op+select->predicated_op case? Are we assuming the implicit undef on the VP intrinsic allows for it?

I'm not sure what problem you think there might be? Both code sequences do the same thing (same side effects, same final result) as the input IR they matched, right? So that's what justifies them both as valid outputs and the choice is just a matter of codegen quality. You don't even need to appeal to the vp.fadd producing undef in disabled lanes, because in the final result those lanes are zero anyway and that's all that matters. This doesn't seem fundamentally more tricky than any other isel pattern that matches multiple IR instructions to produce a more efficient combined instruction. For example, if the ARM backend selects add i32 %a, (shl i32 %b, 4) as add r0, r0, r1, lsl #4, it never materializes shl %b, 4 (not into a register, at least) but the end result is still correct.

In D57504#1852185, @rkruppe wrote:

I'm not sure what problem you think there might be? Both code sequences do the same thing (same side effects, same final result) as the input IR they matched, right?

Ah, right. That side effects are the difference. Thanks for reminding me.

So that's what justifies them both as valid outputs and the choice is just a matter of codegen quality. You don't even need to appeal to the vp.fadd producing undef in disabled lanes, because in the final result those lanes are zero anyway and that's all that matters. This doesn't seem fundamentally more tricky than any other isel pattern that matches multiple IR instructions to produce a more efficient combined instruction. For example, if the ARM backend selects add i32 %a, (shl i32 %b, 4) as add r0, r0, r1, lsl #4, it never materializes shl %b, 4 (not into a register, at least) but the end result is still correct.

Yeah, this was what I was hung up on. I didn't see the difference between something like not materializing a dead instruction and masking an inactive element. But, yeah. the side effects would not be the same.

In D57504#1851864, @cameron.mcinally wrote:
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:
if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;
The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

@hsaito and I had a discussion about this earlier today. I had the same concern, that optimizations after the vectorizer might do something to decouple the vp.select from the vp.{operation}, which could lead to the code generator not being able to create a masked operation with passthru on targets that support that and thus potentially invalidate the cost model assumptions that the vectorizer made when it generated the predicated operation. Hideki convinced me that the additional freedom from explicit dependencies gained by not having a passthru argument as part of the predicated operation was likely to be more beneficial than tight coupling. If we ever do find this to be a problem, we can do something to make the intervening optimizations less aggressive with this sort of pattern.

I also talked briefly with @craig.topper about the X86 codegen handling of this, and his off the cuff reaction was to think that we probably won't have any problem generating the desired passthru+masked instructions from separated vp.select operations.

andrew.w.kaylor added inline comments.Jan 31 2020, 2:06 PM

llvm/docs/Proposals/VectorPredication.rst
3	Is there any reason that some form of this document can't be committed now? We have at least enough support to claim this as a community wide proposal, right?

(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:

The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.

The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.
The instruction returns poison (i.e., all result lanes are poison) and all lanes are (potentially, non-deterministically) enabled regardless of the mask parameter. This is less restrictive for IR optimizations (e.g., integer vp.add can unconditionally be speculated) but still allows backends to unconditionally use SETVL-style "stripmining" instructions that are not generally consistent (across architectures) w.r.t. which lanes become active when a vector length greater than the hardware vector length is requested.
%EVLmask is undef, that's all. As consequence, lanes disabled by the %mask argument definitely stay disabled, but for other lanes (where the mask has a 1 or an undef) it's non-deterministic whether they are active. As far as I can see, this has pretty much the same implications for IR optimizations and backends (excluding hypothetical pathological architectures) but is less of a special case to specify and directly captures the diversity of hardware behavior that (presumably) motivates this restriction on EVL.

Off the cuff, I would suggest the last option.

In D57504#1853591, @rkruppe wrote:
(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:
The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.
The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.

The instruction returns poison (i.e., all result lanes are poison) and all lanes are (potentially, non-deterministically) enabled regardless of the mask parameter. This is less restrictive for IR optimizations (e.g., integer vp.add can unconditionally be speculated) but still allows backends to unconditionally use SETVL-style "stripmining" instructions that are not generally consistent (across architectures) w.r.t. which lanes become active when a vector length greater than the hardware vector length is requested.

%EVLmask is undef, that's all. As consequence, lanes disabled by the %mask argument definitely stay disabled, but for other lanes (where the mask has a 1 or an undef) it's non-deterministic whether they are active. As far as I can see, this has pretty much the same implications for IR optimizations and backends (excluding hypothetical pathological architectures) but is less of a special case to specify and directly captures the diversity of hardware behavior that (presumably) motivates this restriction on EVL.

Off the cuff, I would suggest the last option.

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

In D57504#1853671, @programmerjake wrote:

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

Presumably you have an efficient way to somehow force the VL into the intended range to support strip-mining of loops? The exact strategy doesn't matter, anything that avoids VL being "out of bounds" should make the other options work just fine. (Assuming there aren't other, larger problems with mapping VP operations to your ISA.)

In D57504#1853802, @rkruppe wrote:

In D57504#1853671, @programmerjake wrote:

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

Presumably you have an efficient way to somehow force the VL into the intended range to support strip-mining of loops? The exact strategy doesn't matter, anything that avoids VL being "out of bounds" should make the other options work just fine. (Assuming there aren't other, larger problems with mapping VP operations to your ISA.)

Yes, we do (setvl has a immediate for max VL, which needs to be calculated by the register allocator or similar), though it can be bypassed by writing directly to the VL register.

So, in that case, we should be able to use option #2 or #3, as long as the compiler doesn't write to VL by any means other than setvl.

In D57504#1853591, @rkruppe wrote:
(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:
The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.
The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.
Besides, this still allows you to speculate as long as MVL (as in the UB-causing bound for VL) does not go below VL... could you explain under which circumstance MVL would go below VL by hoisting? This is definitely not the case for static VL targets (x86) and also not for VE.

TODO:

Define behavior for %evl > W
Amend that W is target specific.

llvm/docs/Proposals/VectorPredication.rst
3	I think so. I'll put the proposal doc up for review.

simoll mentioned this in D73889: [Doc] Proposal for vector predication.Feb 3 2020, 6:28 AM

simoll added a child revision: D73889: [Doc] Proposal for vector predication.Feb 3 2020, 6:36 AM

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

I think I'm lost here. Which thing is VL and which is MVL in this scenario?

Also, the talk about how various hardware treats the relative values of VL and MVL concerns me if either of these is supposed to be the width of the vector passed to this intrinsic. My understanding is that we're supposed to be able to generate vectors of any width we want in IR and the type legalization is responsible for mapping that to vector sizes that are legal for the target. So what does the target requirement mean here?

In D57504#1856207, @andrew.w.kaylor wrote:

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

I think I'm lost here. Which thing is VL and which is MVL in this scenario?

VL == %evl
MVL == W
Sorry for the vector speak :)

Also, the talk about how various hardware treats the relative values of VL and MVL concerns me if either of these is supposed to be the width of the vector passed to this intrinsic. My understanding is that we're supposed to be able to generate vectors of any width we want in IR and the type legalization is responsible for mapping that to vector sizes that are legal for the target. So what does the target requirement mean here?

I agree that, in the end, the semantics will be based solely on IR-types. However, what that semantics should look like for the %evl > W case depends on the way targets can handle this to make sure that whatever we specify on IR-level is at least reasonable for all targets.

From what I recall, the plan is to implement this by using fixed-size vector types combined with VL-based ops. MVL would be the size of those vector types.

Quoting all of lkcl's email so it ends up in Phabricator:

On Tue, Feb 4, 2020 at 3:48 AM @lkcl wrote:

In D57504#1856586, @simoll wrote:

In D57504#1856207, @andrew.w.kaylor wrote:

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a
hardware exception. Enforcing strict UB here means VP-users have to
explicitly drop instructions that keep the VL within bounds. This means
that we can optimize the VL computation code and that it can be factored
into cost calculations, etc. With Options 2 & 3 this would happen only
very late in the backend when most scalar optimizations are already
done.

I think I'm lost here. Which thing is VL and which is MVL in this
scenario?

VL == %evl
MVL == W
Sorry for the vector speak :)

ah. right. that bit of information was important, simon :) without
clarification, i assumed W was the "required vector length at the
program loop level", whoops..

I agree that, in the end, the semantics will be based solely on IR-types.
However, what that semantics should look like for the %evl > W case
depends on the way targets can handle this to make sure that whatever we
specify on IR-level is at least reasonable for all targets.

okaaay, riight, so the purpose of the discussion is, e.g., to work out
how to represent things like for-loops in the strcpy example here, is
that right?

https://www.sigarch.org/simd-instructions-considered-harmful/

so %evl > W (i.e. %evl > MVL) in RVV, it is the very effort of trying
to *set* %evl to the loop length, this is retried *in every loop*.
and the implementation (in hardware) very very specifically -
unbeknownst to the programmer (and to the IR writer) - hard-limits
%evl *to* MVL.

to be clear: although the programmer *tries* to set %evl > MVL, this
*never happens*: %evl will *always* be actually set to <= MVL.

it's quite clever.

it is really really important - a critical part of the design of RVV
loops - that the programmer (or LLVM compiler developer in this case)
*not* even know or make any assumptions about what MVL will be. some
hardware will actually have MVL equal to 1. some really unbelievably
powerful and stupidly expensive hardware might have MVL equal to 65536
(yes really, 65536 wide vector ALUs) and the critical thing is, the
assembly code *does not care*. it still works perfectly on both,
despite the fact that you have no idea, really, what value MVL is
going to be.

SimpleV is different in that you absolutely must explicitly declare,
as part of any assembly loops (or any other instructions), precisely
and exactly how large MVL is to be. this is because it is an
"allocation of the number of scalar registers - from the *scalar*
regfile - to be used for the vector operation".

thus, for SimpleV, we do actually need a way in LLVM to represent
(set) MVL, because it is quite literally an "explicit reservation of a
certain size and number of registers".

think of it as a way to say "hey y'know these upcoming SIMD
instructions? yeah, we need to set them to all be of length 8 for this
set. then, like, next we need to set all the upcoming SIMD
instructions to 16, y'ken". actually they're not SIMD they're
vector-ops but you get the idea.

this we do with an *extra* parameter to the SV.SETVL instruction
https://libre-riscv.org/simple_v_extension/appendix/#index8h1

SV.SETVL a2, t4, 8 # MVL==8

now, *if* we have a way to set MVL (through LLVM-IR), we can *also*
use that for doing saving/restoring of entire scalar register files
with a single instruction, as well as use it for function call
register stack save/restore.

basically when we have control over MVL through LLVM-IR, we get a
"LD.MULTI" and "ST.MULTI" instruction "for free" as an accidental
side-benefit.

SV.SETMVL #32 ; tells the hardware that vector operations are to
use 32 *scalar* regs
SV.LD a0, f0, #8 ; loads registers f0 thru f31 from the address at (a0+8)

for SIMD systems such as x86 and ARM, the only way to keep loops as
simple as RVV and SV, you'd need an instruction which, when you got to
the last run through the loop, then whilst %evl would be set to some
fixed-width-at-the-SIMD-boundary, some predicate mask was set up
*instead*... and thus despite the SIMD operation still being 4 (or 8,
or 16), the elements at the end were left alone (masked out)

without such an instruction (one which sets up the predicate bitmask
as not being all 1s on the last loop) you'd have to have a sequence of
instructions that effectively do the same job, and those instructions
will, clearly, impact performance due to them being executed on each
and every loop.

this is, unless the above is expressly supported in a single
instruction (one equivalent to SETVL
which sets up the predicate mask on the last loop) i am sorry to have
to use this particular phrase, a dog's dinner approach when compared
to variable-run vectorisation, and it's why i keep warning that
attempting to add support for fixed-power-of-two-%evl in this proposal
is not a good idea.

even if you _do_ have such an instruction (or a really really short
sequence that's equivalent and does not impact the length of the loop
too badly), the fact that the assembly code has to use 16 wide SIMD if
you want to do high-performance but then if you have short loops you
are wasting ALU resources but if you use 4 wide SIMD to stop wasting
ALU resources you can't do high-performance, you are screwed both
coming and going, and, ultimately, have to resort to stripmining to
properly solve it, and at that point we're *definitely* outside of the
scope of this proposal [as i understand it].

l.

In D57504#1857309, @programmerjake wrote:

From what I recall, the plan is to implement this by using fixed-size vector types combined with VL-based ops. MVL would be the size of those vector types.

To be clear, I'm referring specifically to LLVM IR for SimpleV, not for other targets.

OK. I was picturing MVL as some sort of maximum supported by the hardware in some sense or context. I think(?) I've got it now.

So let me ask about how you're picturing this working on targets that don't support these non-fixed vector lengths. The comments from lkcl have me concerned that we're going to be asked to emulate this behavior, which is possible I suppose but probably not the best choice performance wise. Consider this call:

%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask, i32 4)

Frankly, I'd hope never to see such a thing. We talked about using -1 for the %evl argument for targets that don't support variable vector length (is that the right phrase?), but what are we supposed to do if something else is used?

Disregarding the %evl argument for the moment, the x86 type legalizer might lower this as a masked <8 x double> fadd, or it might lower it as two <4 x double> fadd operations, or it might scalarize it entirely. Even if the target hardware supports 512-bit vectors we might choose to lower it as two <4 x double> fadds. Or we might not. The backend currently considers itself to have the freedom to do anything that meets the semantics of the intrinsic. So that brings up the question of whether we will be expected to honor the %evl argument. In this case, it would be fairly trivial to do so. However, the possibility raises a concern about what the code that generated this IR was trying to do and whether it is a reasonable thing to have done for x86 backends.

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

In D57504#1857458, @andrew.w.kaylor wrote:
OK. I was picturing MVL as some sort of maximum supported by the hardware in some sense or context. I think(?) I've got it now.

So let me ask about how you're picturing this working on targets that don't support these non-fixed vector lengths. The comments from lkcl have me concerned that we're going to be asked to emulate this behavior, which is possible I suppose but probably not the best choice performance wise. Consider this call:
%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask, i32 4)
Frankly, I'd hope never to see such a thing. We talked about using -1 for the %evl argument for targets that don't support variable vector length (is that the right phrase?), but what are we supposed to do if something else is used?

For targets that do not support %evl they can say so through TTI and the ExpandVectorPredicationPass will convert it into:

%mask.vl = icmp ult <8 x i1> <0,1,2,3,4,5,6,7>, ("splat' <8 x i32> 4)
%mask.new = and <8 x i1> %mask, %mask.vl
%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask.new, i32 -1)

Basically, %evl never hits the X86 backend and can be ignored. The expansion pass implements one, unified, legalization strategy for all non-VL targets, achieving predictable behavior across targets.

Disregarding the %evl argument for the moment, the x86 type legalizer might lower this as a masked <8 x double> fadd, or it might lower it as two <4 x double> fadd operations, or it might scalarize it entirely. Even if the target hardware supports 512-bit vectors we might choose to lower it as two <4 x double> fadds. Or we might not. The backend currently considers itself to have the freedom to do anything that meets the semantics of the intrinsic. So that brings up the question of whether we will be expected to honor the %evl argument. In this case, it would be fairly trivial to do so. However, the possibility raises a concern about what the code that generated this IR was trying to do and whether it is a reasonable thing to have done for x86 backends.

I see two sources for VP intrinsics in code:
1.) Hand-written intrinsic code (if we expose VP as C intrinsics in Clang and/or somebody directly implements say a math library in VP, ..)
We do not claim performance portability for VP code. If your actual target is AVX512 and you use VP intrinsics, do not use the %evl parameter (or know how the expansion pass is going to lower it and exploit that).

2.) Optimization passes and (vectorizing) frontends
Vectorizers/frontends should query TTI to decide whether they should be using %evl.
For VL targets, the loop vectorizer could use %evl to implement tail loop predication (as in the DAXPY example https://www.sigarch.org/simd-instructions-considered-harmful/ , linked by @lkcl).
For non-VL targets, you should make the iteration mask the root mask of all other predicates in the loop and set %evl to -1.

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In D57504#1861256, @simoll wrote:

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.

2.) Optimization passes and (vectorizing) frontends
Vectorizers/frontends should query TTI to decide whether they should be using %evl.
For VL targets, the loop vectorizer could use %evl to implement tail loop predication (as in the DAXPY example https://www.sigarch.org/simd-instructions-considered-harmful/ , linked by @lkcl).
For non-VL targets, you should make the iteration mask the root mask of all other predicates in the loop and set %evl to -1.

FWIW this is the approach we plan to use at BSC to vectorize using RISC-V extension. We're currently adding mask information to VPlan recipes that when executed should emit VPred operations with masking. Our plan includes a vplan→vplan transformation that would express the "root" mask as a "set vector length" operation.

In D57504#1862202, @andrew.w.kaylor wrote:

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.

ok so it is starting to sink in what is being proposed: a *mainstream* pass in llvm that *always* puts in vector predication, and then various backends, depending on hardware capability, will either have passes that turn that mandatory vector predication into scalar loops, or SIMD / SIMT (getting rid of %evl in the process), or, in the case of Cray-inspired hardware, calling SETVL assembly code.

if that's accurate, then wow that's quite bold and has a lot of advantages.

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

however for SIMD (ARM, x86, other) i have a suspicion that being able to "hint" the best size of SIMD instruction width to use is probably a good idea.

if a SIMD width hint is available it happens to be synonymous with SimpleV's (hard) requirent to be able to specify MVL.

a scalar system would ignore both %evl and %mvl (or better mpvl - max partition vector length) i.e passes woule eliminate them.

a SIMD system would use %mpvl to choose the best SIMD opcodes for the job, the passes would subdivide work into such chunks then generate the suitablr cornercase last loop as well, *ignoring* %evl in the process.

SimpleV would use both to generate opcodes, coordinating with the regfile allocator, correctly and efficiently.

simoll mentioned this in rGc49b9e0d3284: [Doc] Proposal for vector predication.Feb 10 2020, 1:36 AM

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.
Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now.. again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

The up-to-date list of planned changes (also for this patch) is here: https://reviews.llvm.org/D69891#1871485

In D57504#1871521, @simoll wrote:

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

mmm... honestly? probably not. however we can get away with either inline assembler (for a very limited subset of requirements) or just going "y'know what, let's just set MVL hard-coded to default to 4 or 8 for all loops", for now, as best matched to the (planned) maximum internal register read/write ports for our first chip.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length. [for anyone who may be under the impression that RVV does not need the concept of MVL: see the sub-extension which fits the vector regfile onto the scalar (FP) regfile. if the FP regfile is to be used and useful at the same time, then there needs to be a way to explicity define how much of the FP regfile is to be allocated (to* RVV, and that in turn means being able to define the number of "lanes" to actually be used... which is, funnily enough, exactly what *setting* MVL. N(Lanes) == MVL. MVL == N(Lanes) ].

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

yes, as a separate proposal.

In D57504#1871991, @lkcl wrote:

In D57504#1871521, @simoll wrote:

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

mmm... honestly? probably not. however we can get away with either inline assembler (for a very limited subset of requirements) or just going "y'know what, let's just set MVL hard-coded to default to 4 or 8 for all loops", for now, as best matched to the (planned) maximum internal register read/write ports for our first chip.

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

<snip> [for anyone who may be under the impression that RVV does not need the concept of MVL: see the sub-extension which fits the vector regfile onto the scalar (FP) regfile. if the FP regfile is to be used and useful at the same time, then tere needs to be a way to explicity define how much of the FP regfile is to be allocated (to* RVV, and that in turn means being able to define the number of "lanes" to actually be used... which is, funnily enough, exactly what *setting* MVL. N(Lanes) == MVL. MVL == N(Lanes) ].

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

Yep, and we wouldn't get near the level of support for this RFC otherwise.

again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

yes, as a separate proposal.

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

Ok, I didn't realize VE's SETVL works like that. In that case we don't have much of a choice, unfortunately.

Besides, this still allows you to speculate as long as MVL (as in the UB-causing bound for VL) does not go below VL... could you explain under which circumstance MVL would go below VL by hoisting? This is definitely not the case for static VL targets (x86) and also not for VE.

Of course, for lots of IR that we care about in practice, it will be quite simple to see that hoisting is safe, e.g. because:

%evl it is a constant -1
%evl is computed in a way that can be recognized to produce a small enough value (typical strip-mined loops)
there are earlier unconditional VP operations with the same EVL value (most vectorized functions)

But you need some such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code, otherwise fall back to another implementation. This might mean having vp.foo(..., i32 4) instructions guarded by a runtime check that effectively determines whether that 4 is a legal value, and hoisting the computation out of the condition introduces UB in the executions where it isn't.

Whether this would lead to any end-to-end miscompilations is another question, but that's not a good excuse to implement known-incorrect optimizations.

In D57504#1872310, @simoll wrote:

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

ah yes got you.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

mmm... i don't believe that's a wise choice / decision / assumption. i am partly-guessing-and-making-architectural-assumptions here: imagine that the (very-well-informed) programmer knows how the pipelines of a particular processor work (and i do mean very well), they know that there are a couple of separate pipelines, one which handles e.g. NxFP32, one which handles MxFP64, but that if you issue SIMD instructions of width N=Mx2, it will result in a "blockage" (stall) and under-utilisation.

*however*... if you issue *half* the workload (i.e. MVL == W/2) for the FP32 instructions interleaved with "full" workload (MVL==W for the FP64 ops), *then*, because of the way that the architecture works the two suites of instructions *will* go to the separate pipelines, *will* get done in parallel, because you're not overloading the exact same 64-bit-wide pipeline entrypoint if you'd done... you get what i'm trying to say?

i think what i'm trying to say works better for MMX (the instructions which shared the FP regfile with SIMD instructions, is that right? or is it SSE?) - there you definitely want control over how much of the regfile is allocated to SIMD and how much remains actual for scalar-FP usage, and if MVL == W as a hard-coded assumption, with no "hint", you could end up taking up far more of the FP regfile for SIMD MMX than is efficient / effective.

however... if the compiler could be *explicitly* told, "hey i want you to use only W/2 or W/4 worth of the FP regfile for SIMD operations please, and to automatically create a 2x or 4x loop that makes up for it *as if* you had done a full MVL==W single SIMD instruction", then it becomes possible to create a balance there which will not hammer the L1/L2 cache with LD/ST operations, consuming far more power than necessary, because the SIMD instructions completely dominate the entirety of the FP regfile.

we quickly learned from 3D workloads that they are very computationally-intensive and fit a "LD, massive-amounts-of-SIMD-processing, ST" pattern with *very* little in the way of overlaps. consequently, if the compiler generates:

LD
half-the-processing-because-there's-not-enough-registers
ST-some-temps
do-some-more-processing
LD-out-of-temps, do-a-bit-more-processing
ST

this is horribly, horribly power-inefficient.

so being able to balance the workload, keep things entirely in the regfile even if it means using half-wide (or quarter-wide) SIMD ops and the loops taking twice or 4 times longer in order to avoid the spill into temporary LD/STs, this is far more important than trying to make "individual" SIMD operations (ones that consume far too much of the regfile and result in LD/ST "spill") as wide as possible.

again, however: i'm raising this not to suggest that it be part of *this* RFC, i'm just document it to make sure it's not forgotten, for later.

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

Yep, and we wouldn't get near the level of support for this RFC otherwise.

yehyeh.

In D57504#1872374, @lkcl wrote:

In D57504#1872310, @simoll wrote:

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

ah yes got you.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

mmm... i don't believe that's a wise choice / decision / assumption. i am partly-guessing-and-making-architectural-assumptions here: imagine that the (very-well-informed) programmer knows how the pipelines of a particular processor work (and i do mean very well), they know that there are a couple of separate pipelines, one which handles e.g. NxFP32, one which handles MxFP64, but that if you issue SIMD instructions of width N=Mx2, it will result in a "blockage" (stall) and under-utilisation.

*however*... if you issue *half* the workload (i.e. MVL == W/2) for the FP32 instructions interleaved with "full" workload (MVL==W for the FP64 ops), *then*, because of the way that the architecture works the two suites of instructions *will* go to the separate pipelines, *will* get done in parallel, because you're not overloading the exact same 64-bit-wide pipeline entrypoint if you'd done... you get what i'm trying to say?

i think what i'm trying to say works better for MMX (the instructions which shared the FP regfile with SIMD instructions, is that right? or is it SSE?) - there you definitely want control over how much of the regfile is allocated to SIMD and how much remains actual for scalar-FP usage, and if MVL == W as a hard-coded assumption, with no "hint", you could end up taking up far more of the FP regfile for SIMD MMX than is efficient / effective.

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

In D57504#1854330, @simoll wrote:

But you need some such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code,

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

this specifically *does not matter* on RVV (sigh: when RVV is not done on top of the FP regfile, and there is a separate vector regfile), because the vector regfile is specifically designed to refer to *vectors*... not to invididual elements.

for SimpleV, because we designed it right from the start to sit on top of the int and fp regfiles, what VL is set to *really does matter*, because it defines precisely and exactly how many of the scalar registers are to be used *as* "vector elements".

thus, for RVV, when converting SIMD assembly patterns to RVV, you absolutely *must* use the "loop pattern" described in https://www.sigarch.org/simd-instructions-considered-harmful/

if you try to hard-code-set VL to anything specific, this has the (unintended) side-effect of destroying the entire paradigm on which RVV is based, namely that you are not *supposed* to know the actual hardware vector "lane" size... at all. so, if you had really minimalist hardware which only *had* one actual "Lane", then if you tried to explicitly set VL=4, that hardware is absolutely hosed, as it is literally unable to support, at the hardware level, the three extra lanes requested/demanded.

this is why you have to "ask" for a VL, and the instruction will put the *actual* number of elements that VL got set to into a destination register, because you need to subtract that number of (processed) elements from the loop.

of course, with the idea of dropping RVV on top of the FP regfile that goes somewhat out the window. however i'm not... welcome, shall we say... in the RV WG participation, so you'd need to take this up with them, directly. and try not to mention my name too much because they're quite likely to sabotage things (to everyone's detriment) just because i was the one that came up with the insights. *shakes head*...

In D57504#1872392, @craig.topper wrote:

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

craig, thank you for correcting me. that makes a lot of sense as i can just imagine the x87 designers going "argh, how are we going to avoid a pipeline clash / mess, here" :)

you get the principle i am sure, even though MMX is not a suitable example.

In D57504#1872412, @lkcl wrote:

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

I don't know where you have gotten this idea, it has never been true for as long as I can recall. While RVV implementations have some freedom in how they set VL, there are also lots of rules governing their behavior. Most relevantly, since October 2018 (spec version 0.5-draft), programs requesting something less than or equal to the maximum VL will get exactly that number as VL, no something smaller. And even before that change, there were long-standing significant restrictions on how VL is determined beyond what you claim (see the linked commit).

Furthermore, even if what you said was true, it would not make the scheme I described invalid. VL does not change without the program deliberately executing one of a few instructions that change VL (this is already necessary for any strip-mined loop to work at all). Thus, after executing a SETVL it's enough to inspect the resulting VL to know whether it's safe to execute code that assumes a particular value of VL. More freedom in how VL is determined by the processor just means more possibilities for unnecessarily hitting the fallback path, but that only impacts performance rather than correctness.

In D57504#1876242, @rkruppe wrote:

In D57504#1872412, @lkcl wrote:

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

I don't know where you have gotten this idea, it has never been true for as long as I can recall. While RVV implementations have some freedom in how they set VL, there are also lots of rules governing their behavior. Most relevantly, since October 2018 (spec version 0.5-draft), programs requesting something less than or equal to the maximum VL will get exactly that number as VL, no something smaller. And even before that change, there were long-standing significant restrictions on how VL is determined beyond what you claim (see the linked commit).

remember, with the exclusion from discussion due to the anti-trust practices of the RISC-V Foundation, everyone on the "outside" of the RVV working group process has to "reverse-engineer" what the hell is going on. so please do be patient if i make mistakes, as i am not really very happy spending our sponsor's and donor's time (and money) extracting information from the RVV WG in this way (and shouldn't have to).

Furthermore, even if what you said was true, it would not make the scheme I described invalid.

if you are describing replacing a SIMD loop with a *single* instruction, prefixed with a "SETVL", then my understanding is that yes, it would be... *on some hardware*. if the intention is never to be fully-compatible with *all* RVV-compatible hardware, then that's fine.

think it through: imagine some hardware that has only one "lane". that hardware will ONLY have an *absolute* maximum value for MVL: one.

therefore, if you try to set VL to anything greater than 1, it will *only* permit VL to be set to 1.

the variable nature of MVL on a per-implementor basis has caused other problems as well, particularly in the element-offset (VSLIDE?) instructions. it's been a contentious issue.

VL does not change without the program deliberately executing one of a few instructions that change VL (this is already necessary for any strip-mined loop to work at all). Thus, after executing a SETVL it's enough to inspect the resulting VL to know whether it's safe to execute code that assumes a particular value of VL.

ahhh, okaay, right. i get it. so, you'd have:

SETVL a5, 4 # a5 is the dest reg where VL gets stored
if (a5 != 4)
{

go to fallback loop

}

More freedom in how VL is determined by the processor just means more possibilities for unnecessarily hitting the fallback path, but that only impacts performance rather than correctness.

i would argue that even the check itself - having the fallback path at all - impacts performance (and increases code size).

this is why, in SimpleV, we make it mandatory that even if the underlying hardware does not have a large number of lanes, the implementation *must* provide "virtual" hardware - in effect a hardware for-loop. one other processor which does exactly this is the Broadcomm VideoCore IV. it gives the *impression* of having a 16-wide FP32 SIMD capability, whereas in fact it only has a 4x FP32 operation and the hardware delays for 4 additional cycles, pushing 4 *sets* of 4x FP32 into the (one) 4-wide FP32 pipeline.

In D57504#1872493, @lkcl wrote:

In D57504#1872392, @craig.topper wrote:

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

craig, thank you for correcting me. that makes a lot of sense as i can just imagine the x87 designers going "argh, how are we going to avoid a pipeline clash / mess, here" :)

you get the principle i am sure, even though MMX is not a suitable example.

I don't know about Craig, but I'm not sure I do get the principle. For any given target we have a known maximum vector width (as in total number of bits, not number of elements) that is discoverable through TargetTransformInfo. We also have a "preferred" vector width that gets a default value based on the target architecture, but can be overridden by a command line option and may change what TargetTransformInfo tells you. However, the IR is not bound by these. The optimizer and any front end can generate whatever vectors they like. If some wacky optimization wants to create a <23 x float> vector, that's legal IR. However, when it gets to the backend, the type legalizer is going to do something to break it down into chunks that can be consumed by the processor. To get nicely optimized code, there needs to be cooperation between the optimizer and the backend.

This is why I mentioned before that the discussion of architecture specific details in the context of defining the semantics of the IR is making me nervous. LLVM IR is designed to be target-independent. The VP semantics need to respect that.

That's not to say we can ignore target-specific details. We have two distinct lanes though -- (1) the semantics of the IR, and (2) the mechanisms by which the target details can be discovered so that pre-codegen components can tune the IR for a specific target. We need to make sure the IR semantics are rich enough to represent the details of all targets we intend to support, but the details of the target shouldn't be visible in the IR semantics. Maybe I'm preaching to the choir here. I just want to make sure we're all on the same page. Perhaps this would be cleared up if I had a better understanding of what you were saying.

In D57504#1877261, @andrew.w.kaylor wrote:

Perhaps this would be cleared up if I had a better understanding of what you were saying.

appreciated. if it's ok, can we schedule that for when it's part of a (new) proposal?

In D57504#1877268, @lkcl wrote:

In D57504#1877261, @andrew.w.kaylor wrote:

Perhaps this would be cleared up if I had a better understanding of what you were saying.

appreciated. if it's ok, can we schedule that for when it's part of a (new) proposal?

Sure.

rebased
various fixes
includes llangref rephrasing and atest changes to VP integer patch

Harbormaster failed remote builds in B47153: Diff 246280!Feb 24 2020, 1:06 PM

HsiangKai added a subscriber: HsiangKai.Feb 25 2020, 11:44 PM

FYI, the test failures you are seeing here are due to the generalized pattern matching doing a better job at matching the fsub idiom for fneg. The required test changes are included in https://reviews.llvm.org/D75467 .

Herald added a subscriber: danielkiss. · View Herald TranscriptMar 10 2020, 5:18 AM

Rebased
%evl <= W or UB ensues
fixed LangRef wording

The VP integer intrinsic patch has been accepted. I'll wait until next week with the commit to leave time for comments.

FYI, the VP-integer intrinsics & langref patch is in. Next up: expansion to standard SIMD IR. I'll announce the next patch also on llvm-dev when it's on phabricator.

rkruppe mentioned this in D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.Mar 19 2020, 11:14 AM

huihuisun1991 added a subscriber: huihuisun1991.Mar 28 2020, 2:35 AM

evandro added a subscriber: evandro.Mar 31 2020, 1:49 PM

simoll added a child revision: D78203: [VP,Integer,#2] ExpandVectorPredication pass.Apr 15 2020, 6:54 AM

m_zuckerman added a reviewer: m_zuckerman.Jun 2 2020, 6:18 AM

TODO: adapt to changes of get.active.lane.mask.

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 28 2020, 5:16 AM

Herald added a reviewer: jdoerfert. · View Herald Transcript

Herald added subscribers: nikic, steven.zhang. · View Herald Transcript

venkataramanan.kumar.llvm added a subscriber: venkataramanan.kumar.llvm.Sep 9 2020, 10:37 AM

Herald added a subscriber: ecnelises. · View Herald TranscriptSep 9 2020, 10:37 AM

mdchen added a subscriber: mdchen.Oct 13 2020, 4:46 AM

Kazhuu added a subscriber: Kazhuu.Oct 19 2020, 7:16 AM

xmj added a subscriber: xmj.Oct 20 2020, 11:03 PM

frasercrmck added a subscriber: frasercrmck.Oct 22 2020, 3:29 AM

dnsampaio added a subscriber: dnsampaio.Oct 22 2020, 5:43 AM

simoll added a child revision: D91441: [VP] Build VP SDNodes.Nov 13 2020, 9:38 AM

simoll mentioned this in D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 3:01 AM

simoll added a child revision: D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 4:58 AM

simoll removed a child revision: D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 5:42 AM

Hi @simoll: a quick question regarding vp.load/vp.store/vp.gather/vp.scatter. Does the current definition of VPred allow for something similar to the !nontemporal metadata of regular load/store instructions? I don't see any explicit mention to that but maybe it is already possible using metadata or some other annotation?

Thanks!

rkruppe removed a reviewer: rkruppe.Dec 2 2020, 9:08 AM

rkruppe removed a subscriber: rkruppe.

In D57504#2424884, @rogfer01 wrote:

Hi @simoll: a quick question regarding vp.load/vp.store/vp.gather/vp.scatter. Does the current definition of VPred allow for something similar to the !nontemporal metadata of regular load/store instructions? I don't see any explicit mention to that but maybe it is already possible using metadata or some other annotation?

First time i learn about !nontemporal metadata. I'd be absolutely in favor for supporting this also in VP mem ops!

@hussainjk I don't think we need to support non-temporal md hints right from the start (we can tag on md later) but it'd be great to have a vp.load/store patch with just the intrinsics on Phabricator to start discussions like this and make progress on VP mem ops.

khchen added a subscriber: khchen.Dec 4 2020, 7:13 AM

troyj added a subscriber: troyj.Jan 22 2021, 7:27 AM

hussainjk mentioned this in D99355: Implementation of intrinsic and SDNode definitions for VP load, store, gather, scatter..Mar 25 2021, 10:56 AM

FYI. There is a biweekly syncup call on VP (Tue, 3pm CET, next: 2021-05-11)

Minutes (with zoom link): https://docs.google.com/document/d/1q26ToudQjnqN5x31zk8zgq_s0lem1-BF8pQmciLa4k8/edit?usp=sharing

Contact me, if you want to join our Discord server.

simoll mentioned this in D93470: [VP] Binary floating-point intrinsics..Jun 7 2021, 5:40 AM

frasercrmck mentioned this in D104308: [VP] Add vector-predicated reduction intrinsics.Jun 15 2021, 9:41 AM

simoll mentioned this in rGd21a35ac0a95: [VP] Implementation of intrinsic and SDNode definitions for VP load, store….Jul 1 2021, 4:35 AM

NigelYu added a subscriber: NigelYu.Aug 12 2021, 12:19 PM

Herald added subscribers: ctetreau, ormris. · View Herald TranscriptAug 12 2021, 12:19 PM

frasercrmck mentioned this in rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics.Aug 17 2021, 10:06 AM

tim.schmielau added a subscriber: tim.schmielau.Apr 14 2022, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 5:47 AM

Herald added subscribers: luke957, arichardson. · View Herald Transcript

liaolucy added a subscriber: liaolucy.May 24 2022, 7:34 PM

rui.zhang added a subscriber: rui.zhang.Sep 7 2022, 9:36 AM

Herald added subscribers: • pcwang-thead, nlopes. · View Herald TranscriptSep 7 2022, 9:36 AM

pshung added a subscriber: pshung.May 3 2023, 11:48 PM

Herald added subscribers: hoy, StephenFan. · View Herald TranscriptMay 3 2023, 11:48 PM

harishcse44 added a subscriber: harishcse44.May 23 2023, 1:21 AM

sunshaoce added a subscriber: sunshaoce.Aug 17 2023, 2:35 AM

Herald added a subscriber: wangpc. · View Herald TranscriptAug 17 2023, 2:35 AM

evandro removed a subscriber: evandro.Aug 17 2023, 5:08 PM

Revision Contents

Path

Size

llvm/

docs/

GettingInvolved.rst

6 lines

LangRef.rst

697 lines

Proposals/

VectorPredication.rst

83 lines

include/

llvm/

Analysis/

InstructionSimplify.h

14 lines

TargetTransformInfo.h

15 lines

TargetTransformInfoImpl.h

8 lines

Bitcode/

LLVMBitCodes.h

3 lines

CodeGen/

ExpandVectorPredication.h

23 lines

56 lines

4 lines

14 lines

236 lines

IR/

9 lines

64 lines

400 lines

493 lines

65 lines

299 lines

419 lines

231 lines

1 line

Target/

TargetSelectionDAG.td

63 lines

lib/

Analysis/

InstructionSimplify.cpp

62 lines

TargetTransformInfo.cpp

8 lines

AsmParser/

LLLexer.cpp

3 lines

LLParser.cpp

15 lines

LLToken.h

3 lines

Bitcode/

Reader/

BitcodeReader.cpp

18 lines

Writer/

BitcodeWriter.cpp

6 lines

CodeGen/

CMakeLists.txt

1 line

ExpandVectorPredication.cpp

602 lines

SelectionDAG/

DAGCombiner.cpp

212 lines

LegalizeIntegerTypes.cpp

25 lines

LegalizeTypes.h

2 lines

SelectionDAG.cpp

371 lines

SelectionDAGBuilder.h

6 lines

SelectionDAGBuilder.cpp

469 lines

SelectionDAGDumper.cpp

59 lines

SelectionDAGISel.cpp

4 lines

TargetPassConfig.cpp

5 lines

IR/

6 lines

5 lines

59 lines

938 lines

85 lines

101 lines

127 lines

Transforms/

InstCombine/

InstCombineAddSub.cpp

108 lines

InstCombineCalls.cpp

12 lines

InstCombineInternal.h

13 lines

Utils/

CodeExtractor.cpp

3 lines

test/

Bitcode/

attributes.ll

5 lines

CodeGen/

AArch64/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

ARM/

O3-pipeline.ll

1 line

Generic/

expand-vp.ll

162 lines

X86/

O0-pipeline.ll

1 line

O3-pipeline.ll

1 line

Transforms/

InstCombine/

vp-fsub.ll

45 lines

InstSimplify/

vp-fsub.ll

55 lines

Verifier/

evl_attribs.ll

13 lines

vp-intrinsics-constrained.ll

17 lines

vp-intrinsics.ll

122 lines

tools/

llc/

llc.cpp

1 line

opt/

opt.cpp

1 line

unittests/

IR/

CMakeLists.txt

1 line

IRBuilderTest.cpp

50 lines

VPIntrinsicTest.cpp

190 lines

utils/

TableGen/

CodeGenIntrinsics.h

5 lines

CodeGenTarget.cpp

12 lines

IntrinsicEmitter.cpp

18 lines

Diff 228052

llvm/docs/GettingInvolved.rst

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	.. toctree::

CodeOfConduct		CodeOfConduct
Proposals/GitHubMove		Proposals/GitHubMove
BugpointRedesign		BugpointRedesign
Proposals/LLVMLibC		Proposals/LLVMLibC
Proposals/TestSuite		Proposals/TestSuite
Proposals/VariableNames		Proposals/VariableNames
Proposals/VectorizationPlan		Proposals/VectorizationPlan
		Proposals/VectorPredication

:doc:`CodeOfConduct`		:doc:`CodeOfConduct`
Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,		Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
IRC, etc).		IRC, etc).

:doc:`Proposals/GitHubMove`		:doc:`Proposals/GitHubMove`
Proposal to move from SVN/Git to GitHub.		Proposal to move from SVN/Git to GitHub.

:doc:`BugpointRedesign`		:doc:`BugpointRedesign`
Design doc for a redesign of the Bugpoint tool.		Design doc for a redesign of the Bugpoint tool.

:doc:`Proposals/LLVMLibC`		:doc:`Proposals/LLVMLibC`
Proposal to add a libc implementation under the LLVM project.		Proposal to add a libc implementation under the LLVM project.

:doc:`Proposals/TestSuite`		:doc:`Proposals/TestSuite`
Proposals for additional benchmarks/programs for llvm's test-suite.		Proposals for additional benchmarks/programs for llvm's test-suite.

:doc:`Proposals/VariableNames`		:doc:`Proposals/VariableNames`
Proposal to change the variable names coding standard.		Proposal to change the variable names coding standard.

:doc:`Proposals/VectorizationPlan`		:doc:`Proposals/VectorizationPlan`
Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.		Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
No newline at end of file
		:doc:`Proposals/VectorPredication`
		Proposal to support predicated vector instructions in LLVM.

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,532 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var		<result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var

		.. _i_sub:

'``sub``' Instruction		'``sub``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var		<result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var
<result> = fsub float -0.0, %val ; yields float:result = -%var		<result> = fsub float -0.0, %val ; yields float:result = -%var

		.. _i_mul:

'``mul``' Instruction		'``mul``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var		<result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var

		.. _i_udiv:

'``udiv``' Instruction		'``udiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 30 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = udiv i32 4, %var ; yields i32:result = 4 / %var		<result> = udiv i32 4, %var ; yields i32:result = 4 / %var

		.. _i_sdiv:

'``sdiv``' Instruction		'``sdiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var		<result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var

		.. _i_urem:

'``urem``' Instruction		'``urem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 28 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = urem i32 4, %var ; yields i32:result = 4 % %var		<result> = urem i32 4, %var ; yields i32:result = 4 % %var

		.. _i_srem:

'``srem``' Instruction		'``srem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
-------------------------		-------------------------

Bitwise binary operators are used to do various forms of bit-twiddling		Bitwise binary operators are used to do various forms of bit-twiddling
in a program. They are generally very efficient instructions and can		in a program. They are generally very efficient instructions and can
commonly be strength reduced from other instructions. They require two		commonly be strength reduced from other instructions. They require two
operands of the same type, execute an operation on them, and produce a		operands of the same type, execute an operation on them, and produce a
single value. The resulting value is the same type as its operands.		single value. The resulting value is the same type as its operands.

		.. _i_shl:

'``shl``' Instruction		'``shl``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 36 Lines
.. code-block:: text		.. code-block:: text

<result> = shl i32 4, %var ; yields i32: 4 << %var		<result> = shl i32 4, %var ; yields i32: 4 << %var
<result> = shl i32 4, 2 ; yields i32: 16		<result> = shl i32 4, 2 ; yields i32: 16
<result> = shl i32 1, 10 ; yields i32: 1024		<result> = shl i32 1, 10 ; yields i32: 1024
<result> = shl i32 1, 32 ; undefined		<result> = shl i32 1, 32 ; undefined
<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>		<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>

		.. _i_lshr:


'``lshr``' Instruction		'``lshr``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines	.. code-block:: text

<result> = lshr i32 4, 1 ; yields i32:result = 2		<result> = lshr i32 4, 1 ; yields i32:result = 2
<result> = lshr i32 4, 2 ; yields i32:result = 1		<result> = lshr i32 4, 2 ; yields i32:result = 1
<result> = lshr i8 4, 3 ; yields i8:result = 0		<result> = lshr i8 4, 3 ; yields i8:result = 0
<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F		<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
<result> = lshr i32 1, 32 ; undefined		<result> = lshr i32 1, 32 ; undefined
<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>		<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>

		.. _i_ashr:

'``ashr``' Instruction		'``ashr``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 34 Lines	.. code-block:: text

<result> = ashr i32 4, 1 ; yields i32:result = 2		<result> = ashr i32 4, 1 ; yields i32:result = 2
<result> = ashr i32 4, 2 ; yields i32:result = 1		<result> = ashr i32 4, 2 ; yields i32:result = 1
<result> = ashr i8 4, 3 ; yields i8:result = 0		<result> = ashr i8 4, 3 ; yields i8:result = 0
<result> = ashr i8 -2, 1 ; yields i8:result = -1		<result> = ashr i8 -2, 1 ; yields i8:result = -1
<result> = ashr i32 1, 32 ; undefined		<result> = ashr i32 1, 32 ; undefined
<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>		<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>

		.. _i_and:

'``and``' Instruction		'``and``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = and i32 4, %var ; yields i32:result = 4 & %var		<result> = and i32 4, %var ; yields i32:result = 4 & %var
<result> = and i32 15, 40 ; yields i32:result = 8		<result> = and i32 15, 40 ; yields i32:result = 8
<result> = and i32 4, 8 ; yields i32:result = 0		<result> = and i32 4, 8 ; yields i32:result = 0

		.. _i_or:

'``or``' Instruction		'``or``' Instruction
^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines
""""""""		""""""""

::		::

<result> = or i32 4, %var ; yields i32:result = 4 \| %var		<result> = or i32 4, %var ; yields i32:result = 4 \| %var
<result> = or i32 15, 40 ; yields i32:result = 47		<result> = or i32 15, 40 ; yields i32:result = 47
<result> = or i32 4, 8 ; yields i32:result = 12		<result> = or i32 4, 8 ; yields i32:result = 12

		.. _i_xor:

'``xor``' Instruction		'``xor``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 6,343 Lines • ▼ Show 20 Lines
""""""""""		""""""""""

On some architectures the address of the code to be executed needs to be		On some architectures the address of the code to be executed needs to be
different than the address where the trampoline is actually stored. This		different than the address where the trampoline is actually stored. This
intrinsic returns the executable address corresponding to ``tramp``		intrinsic returns the executable address corresponding to ``tramp``
after performing the required machine specific adjustments. The pointer		after performing the required machine specific adjustments. The pointer
returned can then be :ref:`bitcast and executed <int_trampoline>`.		returned can then be :ref:`bitcast and executed <int_trampoline>`.


		.. _int_vp:

		Vector Predication Intrinsics
		-----------------------------
		VP intrinics are intended for predicated SIMD/vector code.
		A typical VP operation takes a vector mask and an explicit vector length parameter as in:

		::

		<W x T> llvm.vp.<opcode>.*(<W x T> %x, <W x T> %y, <W x i1> %mask, i32 %evl)

		The vector mask parameter always has a vector of bit type, for example `<32 x i1>`.
		The explicit vector length parameter always has the type `i32`.
		The explicit vector length is only effective if the MSB of its value is zero.
		Results are only computed for enabled lanes.
		A lane is enabled if the mask at that position is true and, if effective, where the lane position is below the explicit vector length.


		.. _int_vp_add:

		'``llvm.vp.add.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.add.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.add.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer addition of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.add``' intrinsic performs integer addition (:ref:`add <i_add>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.add.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = add <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef

		.. _int_vp_sub:

		'``llvm.vp.sub.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.sub.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.sub.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer subtraction of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.sub``' intrinsic performs integer subtraction (:ref:`sub <i_sub>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.sub.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = sub <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_mul:

		'``llvm.vp.mul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.mul.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.mul.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer multiplication of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""
		The '``llvm.vp.mul``' intrinsic performs integer multiplication (:ref:`mul <i_mul>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.mul.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = mul <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_sdiv:

		'``llvm.vp.sdiv.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.sdiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.sdiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated, signed division of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.sdiv``' intrinsic performs signed division (:ref:`sdiv <i_sdiv>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.sdiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = sdiv <4 x i32> %a, %b
		%also.r = select <4 x ii> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_udiv:

		'``llvm.vp.udiv.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.udiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.udiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated, unsigned division of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.udiv``' intrinsic performs unsigned division (:ref:`udiv <i_udiv>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.udiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = udiv <4 x i32> %a, %b
		%also.r = select <4 x ii> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_srem:

		'``llvm.vp.srem.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.srem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.srem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated computations of the signed remainder of two integer vectors.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.srem``' intrinsic computes the remainder of the signed division (:ref:`srem <i_srem>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.srem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = srem <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_urem:

		'``llvm.vp.urem.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.urem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.urem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated computation of the unsigned remainder of two integer vectors.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.urem``' intrinsic computes the remainder of the unsigned division (:ref:`urem <i_urem>`) of the first and second vector operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.urem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = urem <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_ashr:

		'``llvm.vp.ashr.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.ashr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.ashr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated arithmetic right-shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.ashr``' intrinsic computes the arithmetic right shift (:ref:`ashr <i_ashr>`) of the first operand by the second operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.ashr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = ashr <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_lshr:


		'``llvm.vp.lshr.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.lshr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.lshr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated logical right-shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.lshr``' intrinsic computes the logical right shift (:ref:`lshr <i_lshr>`) of the first operand by the second operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.lshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = lshr <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_shl:

		'``llvm.vp.shl.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.shl.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.shl.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated left shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.shl``' intrinsic computes the left shift (:ref:`shl <i_shl>`) of the first operand by the second operand on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.shl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = shl <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_or:

		'``llvm.vp.or.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.or.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.or.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated or.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.or``' intrinsic performs a bitwise or (:ref:`or <i_or>`) of the first two operands on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.or.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = or <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_and:

		'``llvm.vp.and.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.and.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.and.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated and.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.and``' intrinsic performs a bitwise and (:ref:`and <i_or>`) of the first two operands on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.and.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = and <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_xor:

		'``llvm.vp.xor.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.xor.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.xor.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated, bitwise xor.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.xor``' intrinsic performs a bitwise xor (:ref:`xor <i_xor>`) of the first two operands on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%t = xor <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_compose:

		'``llvm.vp.compose.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x float> @llvm.vp.compose.v16f32 (<16 x float> <lanes_below_pivot>, <16 x float> <lanes_ge_pivot>, i32 <pivot>, i32 <vector_length>)

		Overview:
		"""""""""

		The compose intrinsic blends two input vectors based on a pivot value.


		Arguments:
		""""""""""

		The first operand is the vector whose elements are selected below the pivot. The second operand is the vector whose values are selected starting from the pivot position. The third operand is the pivot value. The fourth operand it the explicit vector length of the operation


		Semantics:
		""""""""""

		The '``llvm.vp.compose``' intrinsic is designed for conditional blending of two vectors based on a pivot number. All lanes below the pivot are taken from the first operand, all elements at greated and equal positions are taken from the second operand. It is useful for targets that support an explicit vector length and guarantee that vector instructions preserve the contents of vector registers above the AVL of the operation. Other targets may support this intrinsic differently, for example by lowering it into a select with a bitmask that represents the pivot comparison.
		The result of this operation is equivalent to a select with an equivalent predicate mask based on the pivot operand. However, as for all VP intrinsics all lanes above the explicit vector length are undefined.


		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.compose.v4i32(<4 x i32> %a, <4 x i32> %b, i32 %pivot, i32 %evl)
		;; lanes of %r at positions >= %evl are undef

		;; except for %r is equivalent to %also.r
		%tmp = insertelement <4 x i32> undef, %pivot, 0
		%pivot.splat = shufflevector <4 x i32> %tmp, <4 x i32> undef, <4 x i32> zeroinitializer
		%pivot.mask = icmp ult i1 <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %pivot.splat
		%also.r = select <4 x i1> %pivot.mask, <4 x i32> %a, <4 x i32> %b



		.. _int_vp_select:

		'``llvm.vp.select.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.select.v16i32 (<16 x i1> <mask>, <16 x i32> <left_op>, <16 x i32> <right_op>, i32 <vector_length>)
		declare <256 x double> @llvm.vp.select.v256f64 (<256 x i1> <mask>, <256 x double> <left_op>, <256 x double> <right_op>, i32 <vector_length>)

		Overview:
		"""""""""

		Conditional select with an explicit vector length.
		simollAuthorUnsubmitted Done Reply Inline Actions @rkruppe [..] My main point was just that the existing select instruction is not sufficient as the second operation, for essentially the same reason why the VP intrinsics have an EVL argument instead of just the mask. Creating a VP equivalent of select (as already sketched in the other thread) resolves that concern just as well. I agree. The prototype has defined such an `llvm.vp.select` from the get-go. simoll: > @rkruppe > [..] My main point was just that the existing select instruction is not…
		rkruppeUnsubmitted Not Done Reply Inline Actions Oops, missed that / forgot about it. Sorry for the noise. Is there a reason why it's not in the "integer slice" patch? It's not integer-specific, but it seems to fit even less into the other slices. rkruppe: Oops, missed that / forgot about it. Sorry for the noise. Is there a reason why it's not in…
		simollAuthorUnsubmitted Done Reply Inline Actions I wanted to keep the integer patch concise for one. Also, having played around with this for a while now, i think that the signature of `vp.select` should be: llvm.vp.select(<W x i1> %m, %onTrue, %onFalse, i32 %threshold, i32 vlen %evl) meaning that values from %onTrue are selected where %m is true and the lane index is below %threshold. %onFalse is selected otherwise. Lane indices greater-equal %evl are undef as ever. In short: there is just one "merge" operation and no more separate `vp.compose`. simoll: I wanted to keep the integer patch concise for one. Also, having played around with this for a…


		Arguments:
		""""""""""

		The first three operand and the result are vector types of the same length. The second and third operand, and the result have the same vector type. The fourth operand is the explicit vector length.

		Semantics:
		""""""""""

		The '``llvm.vp.select``' intrinsic performs conditional select (:ref:`select <i_select>`) of the second and thirs vector operand on each enabled lane.
		If the explicit vector length (the fourth operand) is effective, the result is undefined on lanes at positions greater-equal-than the explicit vector length.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.select.v4i32(<4 x i1> %mask, <4 x i32> %onTrue, <4 x i32> %onFalse, i32 %avl)
		;; For all lanes below %avl, %r is lane-wise equivalent to %also.r

		%also.r = select <4 x i1> %mask, <4 x i32> %onTrue, <4 x i32> %onFalse



.. _int_mload_mstore:		.. _int_mload_mstore:

Masked Vector Load and Store Intrinsics		Masked Vector Load and Store Intrinsics
---------------------------------------		---------------------------------------

LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.		LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.

.. _int_mload:		.. _int_mload:
▲ Show 20 Lines • Show All 3,361 Lines • Show Last 20 Lines

llvm/docs/Proposals/VectorPredication.rst

This file was added.

				==========================
				Vector Predication Roadmap
				==========================
				andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions Is there any reason that some form of this document can't be committed now? We have at least enough support to claim this as a community wide proposal, right? andrew.w.kaylor: Is there any reason that some form of this document can't be committed now? We have at least…
				simollAuthorUnsubmitted Done Reply Inline Actions I think so. I'll put the proposal doc up for review. simoll: I think so. I'll put the proposal doc up for review.

				.. contents:: Table of Contents
				:depth: 3
				:local:

				Motivation
				==========

				This proposal defines a roadmap towards native vector predication in LLVM, specifically for vector instructions with a mask and/or an explicit vector length.
				LLVM currently has no target-independent means to model predicated vector instructions for modern SIMD ISAs such as AVX512, ARM SVE, the RISC-V V extension and NEC SX-Aurora.
				Only some predicated vector operations, such as masked loads and stores are available through intrinsics [MaskedIR]_.

				The Vector Predication extension
				================================

				The Vector Predication (VP) extension [EvlRFC]_ can be a first step towards native vector predication.
				The VP prototype in this patch demonstrates the following concepts:

				- Predicated vector intrinsics with an explicit mask and vector length parameter on IR level.
				- First-class predicated SDNodes on ISel level. Mask and vector length are value operands.
				- An incremental strategy to generalize PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both regular instructions and VP intrinsics.
				- DAGCombiner example: FMA fusion.
				- InstCombine/InstSimplify example: FSub pattern re-writes.
				- Early experiments on the LNT test suite (Clang static release, O3 -ffast-math) indicate that compile time on non-VP IR is not affected by the API abstractions in PatternMatch, etc.

				Roadmap
				=======

				Drawing from the VP prototype, we propose the following roadmap towards native vector predication in LLVM:


				1. IR-level VP intrinsics
				-------------------------

				- There is a consensus on the semantics/instruction set of VP.
				- VP intrinsics and attributes are available on IR level.
				- TTI has capability flags for VP (``supportsVP()``?, ``haveActiveVectorLength()``?).

				Result: VP usable for IR-level vectorizers (LV, VPlan, RegionVectorizer), potential integration in Clang with builtins.

				2. CodeGen support
				------------------

				- VP intrinsics translate to first-class SDNodes (``llvm.evl.fdiv.* -> evl_fdiv``).
				- VP legalization (legalize explicit vector length to mask (AVX512), legalize VP SDNodes to pre-existing ones (SSE, NEON)).

				Result: Backend development based on VP SDNodes.

				3. Lift InstSimplify/InstCombine/DAGCombiner to VP
				--------------------------------------------------

				- Introduce PredicatedInstruction, PredicatedBinaryOperator, .. helper classes that match standard vector IR and VP intrinsics.
				- Add a matcher context to PatternMatch and context-aware IR Builder APIs.
				- Incrementally lift DAGCombiner to work on VP SDNodes as well as on regular vector instructions.
				- Incrementally lift InstCombine/InstSimplify to operate on VP as well as regular IR instructions.

				Result: Optimization of VP intrinsics on par with standard vector instructions.

				4. Deprecate llvm.masked.* / llvm.experimental.reduce.*
				-------------------------------------------------------

				- Modernize llvm.masked.* / llvm.experimental.reduce* by translating to VP.
				- DCE transitional APIs.

				Result: VP has superseded earlier vector intrinsics.

				5. Predicated IR Instructions
				-----------------------------

				- Vector instructions have an optional mask and vector length parameter. These lower to VP SDNodes (from Stage 2).
				- Phase out VP intrinsics, only keeping those that are not equivalent to vectorized scalar instructions (reduce, shuffles, ..)
				- InstCombine/InstSimplify expect predication in regular Instructions (Stage (3) has laid the groundwork).

				Result: Native vector predication in IR.

				References
				==========

				.. [MaskedIR] `llvm.masked.*` intrinsics, https://llvm.org/docs/LangRef.html#masked-vector-load-and-store-intrinsics
				.. [EvlRFC] Explicit Vector Length RFC, https://reviews.llvm.org/D53613

llvm/include/llvm/Analysis/InstructionSimplify.h

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines
	struct LoopStandardAnalysisResults;			struct LoopStandardAnalysisResults;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class Pass;			class Pass;
	class TargetLibraryInfo;			class TargetLibraryInfo;
	class Type;			class Type;
	class Value;			class Value;
	class MDNode;			class MDNode;
	class BinaryOperator;			class BinaryOperator;
				class VPIntrinsic;
				namespace PatternMatch {
				struct PredicatedContext;
				}

	/// InstrInfoQuery provides an interface to query additional information for			/// InstrInfoQuery provides an interface to query additional information for
	/// instructions like metadata or keywords like nsw, which provides conservative			/// instructions like metadata or keywords like nsw, which provides conservative
	/// results if the users specified it is safe to use.			/// results if the users specified it is safe to use.
	struct InstrInfoQuery {			struct InstrInfoQuery {
	InstrInfoQuery(bool UMD) : UseInstrInfo(UMD) {}			InstrInfoQuery(bool UMD) : UseInstrInfo(UMD) {}
	InstrInfoQuery() : UseInstrInfo(true) {}			InstrInfoQuery() : UseInstrInfo(true) {}
	bool UseInstrInfo = true;			bool UseInstrInfo = true;
	▲ Show 20 Lines • Show All 69 Lines • ▼ Show 20 Lines
	/// Given operands for an FAdd, fold the result or return null.			/// Given operands for an FAdd, fold the result or return null.
	Value SimplifyFAddInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFAddInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

	/// Given operands for an FSub, fold the result or return null.			/// Given operands for an FSub, fold the result or return null.
	Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

				/// Given operands for an FSub, fold the result or return null.
				Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,
				const SimplifyQuery &Q);
				Value SimplifyPredicatedFSubInst(Value LHS, Value *RHS,
				FastMathFlags FMF, const SimplifyQuery &Q,
				PatternMatch::PredicatedContext & PC);

	/// Given operands for an FMul, fold the result or return null.			/// Given operands for an FMul, fold the result or return null.
	Value SimplifyFMulInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFMulInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

	/// Given operands for the multiplication of a FMA, fold the result or return			/// Given operands for the multiplication of a FMA, fold the result or return
	/// null. In contrast to SimplifyFMulInst, this function will not perform			/// null. In contrast to SimplifyFMulInst, this function will not perform
	/// simplifications whose unrounded results differ when rounded to the argument			/// simplifications whose unrounded results differ when rounded to the argument
	/// type.			/// type.
	▲ Show 20 Lines • Show All 105 Lines • ▼ Show 20 Lines
	/// Given operands for a BinaryOperator, fold the result or return null.			/// Given operands for a BinaryOperator, fold the result or return null.
	/// Try to use FastMathFlags when folding the result.			/// Try to use FastMathFlags when folding the result.
	Value SimplifyBinOp(unsigned Opcode, Value LHS, Value *RHS,			Value SimplifyBinOp(unsigned Opcode, Value LHS, Value *RHS,
	FastMathFlags FMF, const SimplifyQuery &Q);			FastMathFlags FMF, const SimplifyQuery &Q);

	/// Given a callsite, fold the result or return null.			/// Given a callsite, fold the result or return null.
	Value SimplifyCall(CallBase Call, const SimplifyQuery &Q);			Value SimplifyCall(CallBase Call, const SimplifyQuery &Q);

				/// Given a VP intrinsic function, fold the result or return null.
				Value *SimplifyVPIntrinsic(VPIntrinsic & VPInst, const SimplifyQuery &Q);

	/// See if we can compute a simplified version of this instruction. If not,			/// See if we can compute a simplified version of this instruction. If not,
	/// return null.			/// return null.
	Value SimplifyInstruction(Instruction I, const SimplifyQuery &Q,			Value SimplifyInstruction(Instruction I, const SimplifyQuery &Q,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);

	/// Replace all uses of 'I' with 'SimpleV' and simplify the uses recursively.			/// Replace all uses of 'I' with 'SimpleV' and simplify the uses recursively.
	///			///
	/// This first performs a normal RAUW of I with SimpleV. It then recursively			/// This first performs a normal RAUW of I with SimpleV. It then recursively
	Show All 35 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show All 39 Lines
}		}

class AssumptionCache;		class AssumptionCache;
class BlockFrequencyInfo;		class BlockFrequencyInfo;
class BranchInst;		class BranchInst;
class Function;		class Function;
class GlobalValue;		class GlobalValue;
class IntrinsicInst;		class IntrinsicInst;
		class PredicatedInstruction;
class LoadInst;		class LoadInst;
class Loop;		class Loop;
class ProfileSummaryInfo;		class ProfileSummaryInfo;
class SCEV;		class SCEV;
class ScalarEvolution;		class ScalarEvolution;
class StoreInst;		class StoreInst;
class SwitchInst;		class SwitchInst;
class TargetLibraryInfo;		class TargetLibraryInfo;
▲ Show 20 Lines • Show All 1,069 Lines • ▼ Show 20 Lines	struct ReductionFlags {
bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.		bool NoNaN; ///< If op is an fp min/max, whether NaNs may be present.
};		};

/// \returns True if the target wants to handle the given reduction idiom in		/// \returns True if the target wants to handle the given reduction idiom in
/// the intrinsics form instead of the shuffle form.		/// the intrinsics form instead of the shuffle form.
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const;		ReductionFlags Flags) const;

		/// \returns True if the vector length parameter should be folded into the vector mask.
		bool shouldFoldVectorLengthIntoMask(const PredicatedInstruction &PredInst) const;

		/// \returns False if this VP op should be replaced by a non-VP op or an unpredicated op plus a select.
		bool supportsVPOperation(const PredicatedInstruction &PredInst) const;

/// \returns True if the target wants to expand the given reduction intrinsic		/// \returns True if the target wants to expand the given reduction intrinsic
/// into a shuffle sequence.		/// into a shuffle sequence.
bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;

/// \returns the size cost of rematerializing a GlobalValue address relative		/// \returns the size cost of rematerializing a GlobalValue address relative
/// to a stack reload.		/// to a stack reload.
unsigned getGISelRematGlobalCost() const;		unsigned getGISelRematGlobalCost() const;

▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	virtual bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes,
unsigned Alignment,		unsigned Alignment,
unsigned AddrSpace) const = 0;		unsigned AddrSpace) const = 0;
virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,		virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
		virtual bool shouldFoldVectorLengthIntoMask(const PredicatedInstruction &PredInst) const = 0;
		virtual bool supportsVPOperation(const PredicatedInstruction &PredInst) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual unsigned getGISelRematGlobalCost() const = 0;		virtual unsigned getGISelRematGlobalCost() const = 0;
virtual int getInstructionLatency(const Instruction *I) = 0;		virtual int getInstructionLatency(const Instruction *I) = 0;
};		};

template <typename T>		template <typename T>
▲ Show 20 Lines • Show All 452 Lines • ▼ Show 20 Lines	unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);		return Impl.getLoadVectorFactor(VF, LoadSize, ChainSizeInBytes, VecTy);
}		}
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const override {		VectorType *VecTy) const override {
return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);		return Impl.getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
}		}
		bool shouldFoldVectorLengthIntoMask(const PredicatedInstruction &PredInst) const override {
		return Impl.shouldFoldVectorLengthIntoMask(PredInst);
		}
		bool supportsVPOperation(const PredicatedInstruction &PredInst) const {
		return Impl.supportsVPOperation(PredInst);
		}
bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags Flags) const override {		ReductionFlags Flags) const override {
return Impl.useReductionIntrinsic(Opcode, Ty, Flags);		return Impl.useReductionIntrinsic(Opcode, Ty, Flags);
}		}
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}

▲ Show 20 Lines • Show All 109 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 585 Lines • ▼ Show 20 Lines	public:
}		}

unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const {		VectorType *VecTy) const {
return VF;		return VF;
}		}

		bool shouldFoldVectorLengthIntoMask(const PredicatedInstruction &PredInst) const {
		return true;
		}

		bool supportsVPOperation(const PredicatedInstruction &PredInst) const {
		return false;
		}

bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
TTI::ReductionFlags Flags) const {		TTI::ReductionFlags Flags) const {
return false;		return false;
}		}

bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
return true;		return true;
}		}
▲ Show 20 Lines • Show All 324 Lines • Show Last 20 Lines

llvm/include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 627 Lines • ▼ Show 20 Lines	enum AttributeKindCodes {
ATTR_KIND_OPT_FOR_FUZZING = 57,		ATTR_KIND_OPT_FOR_FUZZING = 57,
ATTR_KIND_SHADOWCALLSTACK = 58,		ATTR_KIND_SHADOWCALLSTACK = 58,
ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,		ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,
ATTR_KIND_IMMARG = 60,		ATTR_KIND_IMMARG = 60,
ATTR_KIND_WILLRETURN = 61,		ATTR_KIND_WILLRETURN = 61,
ATTR_KIND_NOFREE = 62,		ATTR_KIND_NOFREE = 62,
ATTR_KIND_NOSYNC = 63,		ATTR_KIND_NOSYNC = 63,
ATTR_KIND_SANITIZE_MEMTAG = 64,		ATTR_KIND_SANITIZE_MEMTAG = 64,
		ATTR_KIND_MASK = 65,
		ATTR_KIND_VECTORLENGTH = 66,
		ATTR_KIND_PASSTHRU = 67,
};		};

enum ComdatSelectionKindCodes {		enum ComdatSelectionKindCodes {
COMDAT_SELECTION_KIND_ANY = 1,		COMDAT_SELECTION_KIND_ANY = 1,
COMDAT_SELECTION_KIND_EXACT_MATCH = 2,		COMDAT_SELECTION_KIND_EXACT_MATCH = 2,
COMDAT_SELECTION_KIND_LARGEST = 3,		COMDAT_SELECTION_KIND_LARGEST = 3,
COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,		COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,
COMDAT_SELECTION_KIND_SAME_SIZE = 5,		COMDAT_SELECTION_KIND_SAME_SIZE = 5,
Show All 14 Lines

llvm/include/llvm/CodeGen/ExpandVectorPredication.h

This file was added.

				//===----- ExpandVectorPredication.h - Expand vector predication --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CODEGEN_EXPANDVECTORPREDICATION_H
				#define LLVM_CODEGEN_EXPANDVECTORPREDICATION_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				class ExpandVectorPredicationPass
				: public PassInfoMixin<ExpandVectorPredicationPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM);
				};
				} // end namespace llvm

				#endif // LLVM_CODEGEN_EXPANDVECTORPREDICATION_H

llvm/include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	enum NodeType {
/// them all as its individual results. This nodes has exactly the same		/// them all as its individual results. This nodes has exactly the same
/// number of inputs and outputs. This node is useful for some pieces of the		/// number of inputs and outputs. This node is useful for some pieces of the
/// code generator that want to think about a single node with multiple		/// code generator that want to think about a single node with multiple
/// results, not multiple nodes.		/// results, not multiple nodes.
MERGE_VALUES,		MERGE_VALUES,

/// Simple integer binary arithmetic operators.		/// Simple integer binary arithmetic operators.
ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,		ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,
		VP_ADD, VP_SUB, VP_MUL, VP_SDIV, VP_UDIV, VP_SREM, VP_UREM,

/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing		/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing
/// a signed/unsigned value of type i[2*N], and return the full value as		/// a signed/unsigned value of type i[2*N], and return the full value as
/// two results, each of type iN.		/// two results, each of type iN.
SMUL_LOHI, UMUL_LOHI,		SMUL_LOHI, UMUL_LOHI,

/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and		/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and
/// remainder result.		/// remainder result.
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	enum NodeType {

/// Same as the corresponding unsaturated fixed point instructions, but the		/// Same as the corresponding unsaturated fixed point instructions, but the
/// result is clamped between the min and max values representable by the		/// result is clamped between the min and max values representable by the
/// bits of the first 2 operands.		/// bits of the first 2 operands.
SMULFIXSAT, UMULFIXSAT,		SMULFIXSAT, UMULFIXSAT,

/// Simple binary floating point operators.		/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,		FADD, FSUB, FMUL, FDIV, FREM,
		VP_FADD, VP_FSUB, VP_FMUL, VP_FDIV, VP_FREM,

/// Constrained versions of the binary floating point operators.		/// Constrained versions of the binary floating point operators.
/// These will be lowered to the simple operators before final selection.		/// These will be lowered to the simple operators before final selection.
/// They are used to limit optimizations while the DAG is being		/// They are used to limit optimizations while the DAG is being
/// optimized.		/// optimized.
STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,		STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
STRICT_FMA,		STRICT_FMA,

Show All 9 Lines	enum NodeType {

/// STRICT_FP_TO_[US]INT - Convert a floating point value to a signed or		/// STRICT_FP_TO_[US]INT - Convert a floating point value to a signed or
/// unsigned integer. These have the same semantics as fptosi and fptoui		/// unsigned integer. These have the same semantics as fptosi and fptoui
/// in IR.		/// in IR.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FP_TO_SINT,		STRICT_FP_TO_SINT,
STRICT_FP_TO_UINT,		STRICT_FP_TO_UINT,

/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating		/// X = STRICT_FP_ROUND(Y, TRUNC) - Rounding 'Y' from a larger floating
/// point type down to the precision of the destination VT. TRUNC is a		/// point type down to the precision of the destination VT. TRUNC is a
/// flag, which is always an integer that is zero or one. If TRUNC is 0,		/// flag, which is always an integer that is zero or one. If TRUNC is 0,
/// this is a normal rounding, if it is 1, this FP_ROUND is known to not		/// this is a normal rounding, if it is 1, this FP_ROUND is known to not
/// change the value of Y.		/// change the value of Y.
///		///
/// The TRUNC = 1 case is used in cases where we know that the value will		/// The TRUNC = 1 case is used in cases where we know that the value will
/// not be modified by the node, because Y is not using any of the extra		/// not be modified by the node, because Y is not using any of the extra
/// precision of source type. This allows certain transformations like		/// precision of source type. This allows certain transformations like
/// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,1)) -> X which are not safe for		/// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,1)) -> X which are not safe for
/// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,0)) because the extra bits aren't		/// STRICT_FP_EXTEND(STRICT_FP_ROUND(X,0)) because the extra bits aren't
/// removed.		/// removed.
/// It is used to limit optimizations while the DAG is being optimized.		/// It is used to limit optimizations while the DAG is being optimized.
STRICT_FP_ROUND,		STRICT_FP_ROUND,

/// X = STRICT_FP_EXTEND(Y) - Extend a smaller FP type into a larger FP		/// X = STRICT_FP_EXTEND(Y) - Extend a smaller FP type into a larger FP
/// type.		/// type.
/// It is used to limit optimizations while the DAG is being optimized.		/// It is used to limit optimizations while the DAG is being optimized.
STRICT_FP_EXTEND,		STRICT_FP_EXTEND,

/// FMA - Perform a * b + c with no intermediate rounding step.		/// FMA - Perform a * b + c with no intermediate rounding step.
FMA,		FMA,
		VP_FMA,

/// FMAD - Perform a * b + c, while getting the same result as the		/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.		/// separately rounded operations.
FMAD,		FMAD,

/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This		/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This
/// DAG node does not require that X and Y have the same type, just that		/// DAG node does not require that X and Y have the same type, just that
/// they are both floating point. X and the result must have the same type.		/// they are both floating point. X and the result must have the same type.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	enum NodeType {
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,

		/// VP_VSHIFT(VEC1, AMOUNT, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1. AMOUNT is an integer value. The returned vector is equivalent
		/// to VEC1 shifted by AMOUNT (RETURNED_VEC[idx] = VEC1[idx + AMOUNT]).
		VP_VSHIFT,

		/// VP_COMPRESS(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		VP_COMPRESS,

		/// VP_EXPAND(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		VP_EXPAND,

/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a		/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top		/// scalar value into element 0 of the resultant vector type. The top
/// elements 1 to N-1 of the N-element vector are undefined. The type		/// elements 1 to N-1 of the N-element vector are undefined. The type
/// of the operand must match the vector element type, except when they		/// of the operand must match the vector element type, except when they
/// are integer types. In this case the operand is allowed to be wider		/// are integer types. In this case the operand is allowed to be wider
/// than the vector element type, and is implicitly truncated to it.		/// than the vector element type, and is implicitly truncated to it.
SCALAR_TO_VECTOR,		SCALAR_TO_VECTOR,

Show All 10 Lines	enum NodeType {
MULHU, MULHS,		MULHU, MULHS,

/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned		/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned
/// integers.		/// integers.
SMIN, SMAX, UMIN, UMAX,		SMIN, SMAX, UMIN, UMAX,

/// Bitwise operators - logical and, logical or, logical xor.		/// Bitwise operators - logical and, logical or, logical xor.
AND, OR, XOR,		AND, OR, XOR,
		VP_AND, VP_OR, VP_XOR,

/// ABS - Determine the unsigned absolute value of a signed integer value of		/// ABS - Determine the unsigned absolute value of a signed integer value of
/// the same bitwidth.		/// the same bitwidth.
/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow		/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow
/// is performed.		/// is performed.
ABS,		ABS,

/// Shift and rotation operations. After legalization, the type of the		/// Shift and rotation operations. After legalization, the type of the
/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization		/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization
/// the shift amount can be any type, but care must be taken to ensure it is		/// the shift amount can be any type, but care must be taken to ensure it is
/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before		/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before
/// legalization, types like i1024 can occur and i8 doesn't have enough bits		/// legalization, types like i1024 can occur and i8 doesn't have enough bits
/// to represent the shift amount.		/// to represent the shift amount.
/// When the 1st operand is a vector, the shift amount must be in the same		/// When the 1st operand is a vector, the shift amount must be in the same
/// type. (TLI.getShiftAmountTy() will return the same type when the input		/// type. (TLI.getShiftAmountTy() will return the same type when the input
/// type is a vector.)		/// type is a vector.)
/// For rotates and funnel shifts, the shift amount is treated as an unsigned		/// For rotates and funnel shifts, the shift amount is treated as an unsigned
/// amount modulo the element size of the first operand.		/// amount modulo the element size of the first operand.
///		///
/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.		/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.
/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))		/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))
/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))		/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))
SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,		SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,
		VP_SHL, VP_SRA, VP_SRL,

/// Byte Swap and Counting operators.		/// Byte Swap and Counting operators.
BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,		BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,

/// Bit counting operators with an undefined result for zero inputs.		/// Bit counting operators with an undefined result for zero inputs.
CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,		CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,

/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not		/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not
/// i1 then the high bits must conform to getBooleanContents.		/// i1 then the high bits must conform to getBooleanContents.
SELECT,		SELECT,

/// Select with a vector condition (op #0) and two vector operands (ops #1		/// Select with a vector condition (op #0) and two vector operands (ops #1
/// and #2), returning a vector result. All vectors have the same length.		/// and #2), returning a vector result. All vectors have the same length.
/// Much like the scalar select and setcc, each bit in the condition selects		/// Much like the scalar select and setcc, each bit in the condition selects
/// whether the corresponding result element is taken from op #1 or op #2.		/// whether the corresponding result element is taken from op #1 or op #2.
/// At first, the VSELECT condition is of vXi1 type. Later, targets may		/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a		/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.		/// pattern. The condition follows the BooleanContent format of the target.
VSELECT,		VSELECT,
		VP_SELECT,

		/// Select with an integer pivot (op #0) and two vector operands (ops #1
		SjoerdMeijerUnsubmitted Done Reply Inline Actions I was unfamiliar with this one... I think I know what it does, and how it is different from VP_SELECT, but for clarity, can you define what `integer pivot` is? SjoerdMeijer: I was unfamiliar with this one... I think I know what it does, and how it is different from…
		/// and #2), returning a vector result. Op #3 is the vector length, all
		/// vectors have the same length.
		/// Vector element below the pivot (op #0) are taken from op #1, elements
		SjoerdMeijerUnsubmitted Done Reply Inline Actions typo: hether SjoerdMeijer: typo: hether
		/// at positions greater-equal than the pivot are taken from op #2.
		VP_COMPOSE,

/// Select with condition operator - This selects between a true value and		/// Select with condition operator - This selects between a true value and
/// a false value (ops #2 and #3) based on the boolean result of comparing		/// a false value (ops #2 and #3) based on the boolean result of comparing
/// the lhs and rhs (ops #0 and #1) of a conditional expression with the		/// the lhs and rhs (ops #0 and #1) of a conditional expression with the
/// condition code in op #4, a CondCodeSDNode.		/// condition code in op #4, a CondCodeSDNode.
SELECT_CC,		SELECT_CC,

/// SetCC operator - This evaluates to a true value iff the condition is		/// SetCC operator - This evaluates to a true value iff the condition is
/// true. If the result value type is not i1 then the high bits conform		/// true. If the result value type is not i1 then the high bits conform
/// to getBooleanContents. The operands to this are the left and right		/// to getBooleanContents. The operands to this are the left and right
/// operands to compare (ops #0, and #1) and the condition code to compare		/// operands to compare (ops #0, and #1) and the condition code to compare
/// them with (op #2) as a CondCodeSDNode. If the operands are vector types		/// them with (op #2) as a CondCodeSDNode. If the operands are vector types
/// then the result type must also be a vector type.		/// then the result type must also be a vector type.
SETCC,		SETCC,
		VP_SETCC,

/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but		/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but
/// op #2 is a boolean indicating if there is an incoming carry. This		/// op #2 is a boolean indicating if there is an incoming carry. This
/// operator checks the result of "LHS - RHS - Carry", and can be used to		/// operator checks the result of "LHS - RHS - Carry", and can be used to
/// compare two wide integers:		/// compare two wide integers:
/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).		/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).
/// Only valid for integers.		/// Only valid for integers.
SETCCCARRY,		SETCCCARRY,
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	enum NodeType {
/// Perform various unary floating-point operations inspired by libm. For		/// Perform various unary floating-point operations inspired by libm. For
/// FPOWI, the result is undefined if if the integer operand doesn't fit		/// FPOWI, the result is undefined if if the integer operand doesn't fit
/// into 32 bits.		/// into 32 bits.
FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,		FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,
FLOG, FLOG2, FLOG10, FEXP, FEXP2,		FLOG, FLOG2, FLOG10, FEXP, FEXP2,
FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,		FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,
LROUND, LLROUND, LRINT, LLRINT,		LROUND, LLROUND, LRINT, LLRINT,

		VP_FNEG, // TODO supplement VP opcodes
/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two		/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two
/// values.		/// values.
//		//
/// In the case where a single input is a NaN (either signaling or quiet),		/// In the case where a single input is a NaN (either signaling or quiet),
/// the non-NaN input is returned.		/// the non-NaN input is returned.
///		///
/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.		/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.
FMINNUM, FMAXNUM,		FMINNUM, FMAXNUM,
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	enum NodeType {

// Masked load and store - consecutive vector load and store operations		// Masked load and store - consecutive vector load and store operations
// with additional mask operand that prevents memory accesses to the		// with additional mask operand that prevents memory accesses to the
// masked-off lanes.		// masked-off lanes.
//		//
// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)		// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)
// OutChain = MSTORE(Value, BasePtr, Mask)		// OutChain = MSTORE(Value, BasePtr, Mask)
MLOAD, MSTORE,		MLOAD, MSTORE,
		VP_LOAD, VP_STORE,

// Masked gather and scatter - load and store operations for a vector of		// Masked gather and scatter - load and store operations for a vector of
// random addresses with additional mask operand that prevents memory		// random addresses with additional mask operand that prevents memory
// accesses to the masked-off lanes.		// accesses to the masked-off lanes.
//		//
// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)		// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)
// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)		// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)
//		//
// The Index operand can have more vector elements than the other operands		// The Index operand can have more vector elements than the other operands
// due to type legalization. The extra elements are ignored.		// due to type legalization. The extra elements are ignored.
MGATHER, MSCATTER,		MGATHER, MSCATTER,
		VP_GATHER, VP_SCATTER,

/// This corresponds to the llvm.lifetime.* intrinsics. The first operand		/// This corresponds to the llvm.lifetime.* intrinsics. The first operand
/// is the chain and the second operand is the alloca pointer.		/// is the chain and the second operand is the alloca pointer.
LIFETIME_START, LIFETIME_END,		LIFETIME_START, LIFETIME_END,

/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the		/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the
/// beginning and end of GC transition sequence, and carry arbitrary		/// beginning and end of GC transition sequence, and carry arbitrary
/// information that target might need for lowering. The first operand is		/// information that target might need for lowering. The first operand is
Show All 21 Lines	enum NodeType {
VECREDUCE_FMAX, VECREDUCE_FMIN,		VECREDUCE_FMAX, VECREDUCE_FMIN,
/// Integer reductions may have a result type larger than the vector element		/// Integer reductions may have a result type larger than the vector element
/// type. However, the reduction is performed using the vector element type		/// type. However, the reduction is performed using the vector element type
/// and the value in the top bits is unspecified.		/// and the value in the top bits is unspecified.
VECREDUCE_ADD, VECREDUCE_MUL,		VECREDUCE_ADD, VECREDUCE_MUL,
VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,		VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,		VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,

		VP_REDUCE_FADD, VP_REDUCE_FMUL,
		VP_REDUCE_ADD, VP_REDUCE_MUL,
		VP_REDUCE_AND, VP_REDUCE_OR, VP_REDUCE_XOR,
		VP_REDUCE_SMAX, VP_REDUCE_SMIN, VP_REDUCE_UMAX, VP_REDUCE_UMIN,

		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		VP_REDUCE_FMAX, VP_REDUCE_FMIN,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};

/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations		/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations
/// which do not reference a specific memory location should be less than		/// which do not reference a specific memory location should be less than
/// this value. Those that do must not be less than this value, and can		/// this value. Those that do must not be less than this value, and can
▲ Show 20 Lines • Show All 160 Lines • ▼ Show 20 Lines
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);

/// Return the result of a logical AND between different comparisons of		/// Return the result of a logical AND between different comparisons of
/// identical values: ((X op1 Y) & (X op2 Y)). This function returns		/// identical values: ((X op1 Y) & (X op2 Y)). This function returns
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);

		/// Return the mask operand of this VP SDNode.
		/// Otherwise, return -1.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions just spell out 'otherwise' here, and also below. SjoerdMeijer: just spell out 'otherwise' here, and also below.
		int GetMaskPosVP(unsigned OpCode);

		/// Return the vector length operand of this VP SDNode.
		/// Otherwise, return -1.
		int GetVectorLengthPosVP(unsigned OpCode);

		/// Translate this VP OpCode to an unpredicated instruction OpCode.
		unsigned GetFunctionOpCodeForVP(unsigned VPOpCode, bool hasFPExcept);

		/// Translate this non-VP Opcode to its corresponding VP Opcode
		unsigned GetVPForFunctionOpCode(unsigned OpCode);

} // end llvm::ISD namespace		} // end llvm::ISD namespace

} // end llvm namespace		} // end llvm namespace

#endif		#endif

llvm/include/llvm/CodeGen/Passes.h

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	/// MachineDominanaceFrontier - This pass is a machine dominators analysis pass.
/// This pass performs outlining on machine instructions directly before		/// This pass performs outlining on machine instructions directly before
/// printing assembly.		/// printing assembly.
ModulePass *createMachineOutlinerPass(bool RunOnAllFunctions = true);		ModulePass *createMachineOutlinerPass(bool RunOnAllFunctions = true);

/// This pass expands the experimental reduction intrinsics into sequences of		/// This pass expands the experimental reduction intrinsics into sequences of
/// shuffles.		/// shuffles.
FunctionPass *createExpandReductionsPass();		FunctionPass *createExpandReductionsPass();

		/// This pass expands the vector predication intrinsics into unpredicated instructions
		/// with selects or just the explicit vector length into the predicate mask.
		FunctionPass *createExpandVectorPredicationPass();

// This pass expands memcmp() to load/stores.		// This pass expands memcmp() to load/stores.
FunctionPass *createExpandMemCmpPass();		FunctionPass *createExpandMemCmpPass();

/// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp		/// Creates Break False Dependencies pass. \see BreakFalseDeps.cpp
FunctionPass *createBreakFalseDeps();		FunctionPass *createBreakFalseDeps();

// This pass expands indirectbr instructions.		// This pass expands indirectbr instructions.
FunctionPass *createIndirectBrExpandPass();		FunctionPass *createIndirectBrExpandPass();
Show All 14 Lines

llvm/include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 1,120 Lines • ▼ Show 20 Lines	getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,		MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
const AAMDNodes &AAInfo = AAMDNodes());		const AAMDNodes &AAInfo = AAMDNodes());
SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, EVT SVT, MachineMemOperand *MMO);		SDValue Ptr, EVT SVT, MachineMemOperand *MMO);
SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,		SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,
SDValue Offset, ISD::MemIndexedMode AM);		SDValue Offset, ISD::MemIndexedMode AM);

/// Returns sum of the base pointer and offset.		/// Returns sum of the base pointer and offset.
		SDValue getLoadVP(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
		SDValue Mask, SDValue VLen, EVT MemVT,
		MachineMemOperand *MMO, ISD::LoadExtType);
		SDValue getStoreVP(SDValue Chain, const SDLoc &dl, SDValue Val,
		SDValue Ptr, SDValue Mask, SDValue VLen, EVT MemVT,
		MachineMemOperand *MMO, bool IsTruncating = false);
		SDValue getGatherVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
		ISD::MemIndexType IndexType);
		SDValue getScatterVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO,
		ISD::MemIndexType IndexType);

		/// Returns sum of the base pointer and offset.
SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);		SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);

SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,		SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
SDValue Mask, SDValue Src0, EVT MemVT,		SDValue Mask, SDValue Src0, EVT MemVT,
MachineMemOperand *MMO, ISD::LoadExtType,		MachineMemOperand *MMO, ISD::LoadExtType,
bool IsExpanding = false);		bool IsExpanding = false);
SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, SDValue Mask, EVT MemVT,		SDValue Ptr, SDValue Mask, EVT MemVT,
▲ Show 20 Lines • Show All 669 Lines • Show Last 20 Lines

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 543 Lines • ▼ Show 20 Lines	class MemSDNodeBitfields {
uint16_t IsDereferenceable : 1;		uint16_t IsDereferenceable : 1;
uint16_t IsInvariant : 1;		uint16_t IsInvariant : 1;
};		};
enum { NumMemSDNodeBits = NumSDNodeBits + 4 };		enum { NumMemSDNodeBits = NumSDNodeBits + 4 };

class LSBaseSDNodeBitfields {		class LSBaseSDNodeBitfields {
friend class LSBaseSDNode;		friend class LSBaseSDNode;
friend class MaskedGatherScatterSDNode;		friend class MaskedGatherScatterSDNode;
		friend class VPGatherScatterSDNode;

uint16_t : NumMemSDNodeBits;		uint16_t : NumMemSDNodeBits;

// This storage is shared between disparate class hierarchies to hold an		// This storage is shared between disparate class hierarchies to hold an
// enumeration specific to the class hierarchy in use.		// enumeration specific to the class hierarchy in use.
// LSBaseSDNode => enum ISD::MemIndexedMode		// LSBaseSDNode => enum ISD::MemIndexedMode
// MaskedGatherScatterSDNode => enum ISD::MemIndexType		// MaskedGatherScatterSDNode => enum ISD::MemIndexType
uint16_t AddressingMode : 3;		uint16_t AddressingMode : 3;
};		};
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };		enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };

class LoadSDNodeBitfields {		class LoadSDNodeBitfields {
friend class LoadSDNode;		friend class LoadSDNode;
friend class MaskedLoadSDNode;		friend class MaskedLoadSDNode;
		friend class VPLoadSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t ExtTy : 2; // enum ISD::LoadExtType		uint16_t ExtTy : 2; // enum ISD::LoadExtType
uint16_t IsExpanding : 1;		uint16_t IsExpanding : 1;
};		};

class StoreSDNodeBitfields {		class StoreSDNodeBitfields {
friend class StoreSDNode;		friend class StoreSDNode;
friend class MaskedStoreSDNode;		friend class MaskedStoreSDNode;
		friend class VPStoreSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t IsTruncating : 1;		uint16_t IsTruncating : 1;
uint16_t IsCompressing : 1;		uint16_t IsCompressing : 1;
};		};

union {		union {
▲ Show 20 Lines • Show All 132 Lines • ▼ Show 20 Lines	switch (NodeType) {
case ISD::STRICT_FP_TO_SINT:		case ISD::STRICT_FP_TO_SINT:
case ISD::STRICT_FP_TO_UINT:		case ISD::STRICT_FP_TO_UINT:
case ISD::STRICT_FP_ROUND:		case ISD::STRICT_FP_ROUND:
case ISD::STRICT_FP_EXTEND:		case ISD::STRICT_FP_EXTEND:
return true;		return true;
}		}
}		}

		/// Test whether this is a vector predicated node.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Perhaps outdated comment? Should it be something along the lines of 'vector predicated node' i.s.o. explicit vector lenght node? SjoerdMeijer: Perhaps outdated comment? Should it be something along the lines of 'vector predicated node' i.
		bool isVP() const {
		switch (NodeType) {
		default:
		return false;
		case ISD::VP_LOAD:
		case ISD::VP_STORE:
		case ISD::VP_GATHER:
		case ISD::VP_SCATTER:

		case ISD::VP_FNEG:

		case ISD::VP_FADD:
		case ISD::VP_FMUL:
		case ISD::VP_FSUB:
		case ISD::VP_FDIV:
		case ISD::VP_FREM:

		case ISD::VP_FMA:

		case ISD::VP_ADD:
		case ISD::VP_MUL:
		case ISD::VP_SUB:
		case ISD::VP_SRA:
		case ISD::VP_SRL:
		case ISD::VP_SHL:
		case ISD::VP_UDIV:
		case ISD::VP_SDIV:
		case ISD::VP_UREM:
		case ISD::VP_SREM:

		case ISD::VP_EXPAND:
		case ISD::VP_COMPRESS:
		case ISD::VP_VSHIFT:
		case ISD::VP_SETCC:
		case ISD::VP_COMPOSE:

		case ISD::VP_AND:
		case ISD::VP_XOR:
		case ISD::VP_OR:

		case ISD::VP_REDUCE_ADD:
		case ISD::VP_REDUCE_SMIN:
		case ISD::VP_REDUCE_SMAX:
		case ISD::VP_REDUCE_UMIN:
		case ISD::VP_REDUCE_UMAX:

		case ISD::VP_REDUCE_MUL:
		case ISD::VP_REDUCE_AND:
		case ISD::VP_REDUCE_OR:
		case ISD::VP_REDUCE_FADD:
		case ISD::VP_REDUCE_FMUL:
		case ISD::VP_REDUCE_FMIN:
		case ISD::VP_REDUCE_FMAX:

		return true;
		}
		}


/// Test if this node has a post-isel opcode, directly		/// Test if this node has a post-isel opcode, directly
/// corresponding to a MachineInstr opcode.		/// corresponding to a MachineInstr opcode.
bool isMachineOpcode() const { return NodeType < 0; }		bool isMachineOpcode() const { return NodeType < 0; }

/// This may only be called if isMachineOpcode returns		/// This may only be called if isMachineOpcode returns
/// true. It returns the MachineInstr opcode value that the node's opcode		/// true. It returns the MachineInstr opcode value that the node's opcode
/// corresponds to.		/// corresponds to.
unsigned getMachineOpcode() const {		unsigned getMachineOpcode() const {
▲ Show 20 Lines • Show All 689 Lines • ▼ Show 20 Lines	return N->getOpcode() == ISD::LOAD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|
N->getOpcode() == ISD::ATOMIC_LOAD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD \|\|
N->getOpcode() == ISD::ATOMIC_STORE \|\|		N->getOpcode() == ISD::ATOMIC_STORE \|\|
N->getOpcode() == ISD::MLOAD \|\|		N->getOpcode() == ISD::MLOAD \|\|
N->getOpcode() == ISD::MSTORE \|\|		N->getOpcode() == ISD::MSTORE \|\|
N->getOpcode() == ISD::MGATHER \|\|		N->getOpcode() == ISD::MGATHER \|\|
N->getOpcode() == ISD::MSCATTER \|\|		N->getOpcode() == ISD::MSCATTER \|\|
		N->getOpcode() == ISD::VP_LOAD \|\|
		SjoerdMeijerUnsubmitted Done Reply Inline Actions indentation of `\|\|` off by 1? SjoerdMeijer: indentation of `\|\|` off by 1?
		N->getOpcode() == ISD::VP_STORE \|\|
		N->getOpcode() == ISD::VP_GATHER \|\|
		N->getOpcode() == ISD::VP_SCATTER \|\|
N->isMemIntrinsic() \|\|		N->isMemIntrinsic() \|\|
N->isTargetMemoryOpcode();		N->isTargetMemoryOpcode();
}		}
};		};

/// This is an SDNode representing atomic operations.		/// This is an SDNode representing atomic operations.
class AtomicSDNode : public MemSDNode {		class AtomicSDNode : public MemSDNode {
public:		public:
▲ Show 20 Lines • Show All 845 Lines • ▼ Show 20 Lines	public:
const SDValue &getBasePtr() const { return getOperand(2); }		const SDValue &getBasePtr() const { return getOperand(2); }
const SDValue &getOffset() const { return getOperand(3); }		const SDValue &getOffset() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::STORE;		return N->getOpcode() == ISD::STORE;
}		}
};		};

		/// This base class is used to represent VP_LOAD and VP_STORE nodes
		SjoerdMeijerUnsubmitted Done Reply Inline Actions `VP_LOAD` and `VP_STORE`? SjoerdMeijer: `VP_LOAD` and `VP_STORE`?
		class VPLoadStoreSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		VPLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}

		// VPLoadSDNode (Chain, ptr, mask, VLen)
		// VPStoreSDNode (Chain, data, ptr, mask, VLen)
		// Mask is a vector of i1 elements, Vlen is i32
		const SDValue &getBasePtr() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 1 : 2);
		}
		const SDValue &getMask() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 2 : 3);
		}
		const SDValue &getVectorLength() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 3 : 4);
		}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_LOAD \|\|
		N->getOpcode() == ISD::VP_STORE;
		}
		};

		/// This class is used to represent a VP_LOAD node
		SjoerdMeijerUnsubmitted Done Reply Inline Actions same? SjoerdMeijer: same?
		class VPLoadSDNode : public VPLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		VPLoadSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		ISD::LoadExtType ETy, EVT MemVT,
		MachineMemOperand *MMO)
		: VPLoadStoreSDNode(ISD::VP_LOAD, Order, dl, VTs, MemVT, MMO) {
		LoadSDNodeBits.ExtTy = ETy;
		LoadSDNodeBits.IsExpanding = false;
		}

		ISD::LoadExtType getExtensionType() const {
		return static_cast<ISD::LoadExtType>(LoadSDNodeBits.ExtTy);
		}

		const SDValue &getBasePtr() const { return getOperand(1); }
		const SDValue &getMask() const { return getOperand(2); }
		const SDValue &getVectorLength() const { return getOperand(3); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_LOAD;
		}
		bool isExpandingLoad() const { return LoadSDNodeBits.IsExpanding; }
		};

		/// This class is used to represent a VP_STORE node
		class VPStoreSDNode : public VPLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		VPStoreSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		bool isTrunc, EVT MemVT,
		MachineMemOperand *MMO)
		: VPLoadStoreSDNode(ISD::VP_STORE, Order, dl, VTs, MemVT, MMO) {
		StoreSDNodeBits.IsTruncating = isTrunc;
		StoreSDNodeBits.IsCompressing = false;
		}

		/// Return true if this is a truncating store.
		SjoerdMeijerUnsubmitted Done Reply Inline Actions `.. does a truncation before store` sounds a bit odd. Since 'truncating store' is a well known term, and that you explain what it is for ints/floats below, I think it suffices to say "Return true if this is truncating store. For intergers ..." SjoerdMeijer: `.. does a truncation before store` sounds a bit odd. Since 'truncating store' is a well known…
		/// For integers this is the same as doing a TRUNCATE and storing the result.
		/// For floats, it is the same as doing an FP_ROUND and storing the result.
		bool isTruncatingStore() const { return StoreSDNodeBits.IsTruncating; }

		/// Returns true if the op does a compression to the vector before storing.
		/// The node contiguously stores the active elements (integers or floats)
		/// in src (those with their respective bit set in writemask k) to unaligned
		/// memory at base_addr.
		bool isCompressingStore() const { return StoreSDNodeBits.IsCompressing; }

		const SDValue &getValue() const { return getOperand(1); }
		const SDValue &getBasePtr() const { return getOperand(2); }
		const SDValue &getMask() const { return getOperand(3); }
		const SDValue &getVectorLength() const { return getOperand(4); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_STORE;
		}
		};

/// This base class is used to represent MLOAD and MSTORE nodes		/// This base class is used to represent MLOAD and MSTORE nodes
class MaskedLoadStoreSDNode : public MemSDNode {		class MaskedLoadStoreSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
MachineMemOperand *MMO)		MachineMemOperand *MMO)
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	public:
const SDValue &getMask() const { return getOperand(3); }		const SDValue &getMask() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::MSTORE;		return N->getOpcode() == ISD::MSTORE;
}		}
};		};

/// This is a base class used to represent		/// This is a base class used to represent
		/// VP_GATHER and VP_SCATTER nodes
		///
		class VPGatherScatterSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		VPGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO, ISD::MemIndexType IndexType)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {
		LSBaseSDNodeBits.AddressingMode = IndexType;
		assert(getIndexType() == IndexType && "Value truncated");
		}

		/// How is Index applied to BasePtr when computing addresses.
		ISD::MemIndexType getIndexType() const {
		return static_cast<ISD::MemIndexType>(LSBaseSDNodeBits.AddressingMode);
		}
		bool isIndexScaled() const {
		return (getIndexType() == ISD::SIGNED_SCALED) \|\|
		(getIndexType() == ISD::UNSIGNED_SCALED);
		}
		bool isIndexSigned() const {
		return (getIndexType() == ISD::SIGNED_SCALED) \|\|
		(getIndexType() == ISD::SIGNED_UNSCALED);
		}

		// In the both nodes address is Op1, mask is Op2:
		// VPGatherSDNode (Chain, base, index, scale, mask, vlen)
		// VPScatterSDNode (Chain, value, base, index, scale, mask, vlen)
		// Mask is a vector of i1 elements
		const SDValue &getBasePtr() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 1 : 2); }
		const SDValue &getIndex() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 2 : 3); }
		const SDValue &getScale() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 3 : 4); }
		const SDValue &getMask() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 4 : 5); }
		const SDValue &getVectorLength() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 5 : 6); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_GATHER \|\|
		N->getOpcode() == ISD::VP_SCATTER;
		}
		};

		/// This class is used to represent an VP_GATHER node
		///
		class VPGatherSDNode : public VPGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		VPGatherSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO,
		ISD::MemIndexType IndexType)
		: VPGatherScatterSDNode(ISD::VP_GATHER, Order, dl, VTs, MemVT, MMO, IndexType) {}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_GATHER;
		}
		};

		/// This class is used to represent an VP_SCATTER node
		///
		class VPScatterSDNode : public VPGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		VPScatterSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO,
		ISD::MemIndexType IndexType)
		: VPGatherScatterSDNode(ISD::VP_SCATTER, Order, dl, VTs, MemVT, MMO, IndexType) {}

		const SDValue &getValue() const { return getOperand(1); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_SCATTER;
		}
		};


		/// This is a base class used to represent
/// MGATHER and MSCATTER nodes		/// MGATHER and MSCATTER nodes
///		///
class MaskedGatherScatterSDNode : public MemSDNode {		class MaskedGatherScatterSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
▲ Show 20 Lines • Show All 285 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Attributes.td

	Show First 20 Lines • Show All 133 Lines • ▼ Show 20 Lines
	def ReadOnly : EnumAttr<"readonly">;			def ReadOnly : EnumAttr<"readonly">;

	/// Return value is always equal to this argument.			/// Return value is always equal to this argument.
	def Returned : EnumAttr<"returned">;			def Returned : EnumAttr<"returned">;

	/// Parameter is required to be a trivial constant.			/// Parameter is required to be a trivial constant.
	def ImmArg : EnumAttr<"immarg">;			def ImmArg : EnumAttr<"immarg">;

				/// Return value that is equal to this argument on enabled lanes (mask).
				def Passthru : EnumAttr<"passthru">;

				/// Mask argument that applies to this function.
				def Mask : EnumAttr<"mask">;

				/// Dynamic Vector Length argument of this function.
				def VectorLength : EnumAttr<"vlen">;

	/// Function can return twice.			/// Function can return twice.
	def ReturnsTwice : EnumAttr<"returns_twice">;			def ReturnsTwice : EnumAttr<"returns_twice">;

	/// Safe Stack protection.			/// Safe Stack protection.
	def SafeStack : EnumAttr<"safestack">;			def SafeStack : EnumAttr<"safestack">;

	/// Shadow Call Stack protection.			/// Shadow Call Stack protection.
	def ShadowCallStack : EnumAttr<"shadowcallstack">;			def ShadowCallStack : EnumAttr<"shadowcallstack">;
	▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IRBuilder.h

Show All 23 Lines
#include "llvm/IR/ConstantFolder.h"		#include "llvm/IR/ConstantFolder.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	protected:
BasicBlock *BB;		BasicBlock *BB;
BasicBlock::iterator InsertPt;		BasicBlock::iterator InsertPt;
LLVMContext &Context;		LLVMContext &Context;

MDNode *DefaultFPMathTag;		MDNode *DefaultFPMathTag;
FastMathFlags FMF;		FastMathFlags FMF;

bool IsFPConstrained;		bool IsFPConstrained;
ConstrainedFPIntrinsic::ExceptionBehavior DefaultConstrainedExcept;		ExceptionBehavior DefaultConstrainedExcept;
ConstrainedFPIntrinsic::RoundingMode DefaultConstrainedRounding;		RoundingMode DefaultConstrainedRounding;

ArrayRef<OperandBundleDef> DefaultOperandBundles;		ArrayRef<OperandBundleDef> DefaultOperandBundles;

public:		public:
IRBuilderBase(LLVMContext &context, MDNode *FPMathTag = nullptr,		IRBuilderBase(LLVMContext &context, MDNode *FPMathTag = nullptr,
ArrayRef<OperandBundleDef> OpBundles = None)		ArrayRef<OperandBundleDef> OpBundles = None)
: Context(context), DefaultFPMathTag(FPMathTag), IsFPConstrained(false),		: Context(context), DefaultFPMathTag(FPMathTag), IsFPConstrained(false),
DefaultConstrainedExcept(ConstrainedFPIntrinsic::ebStrict),		DefaultConstrainedExcept(ExceptionBehavior::ebStrict),
DefaultConstrainedRounding(ConstrainedFPIntrinsic::rmDynamic),		DefaultConstrainedRounding(RoundingMode::rmDynamic),
DefaultOperandBundles(OpBundles) {		DefaultOperandBundles(OpBundles) {
ClearInsertionPoint();		ClearInsertionPoint();
}		}

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Builder configuration methods		// Builder configuration methods
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 111 Lines • ▼ Show 20 Lines	public:
/// by this setting.		/// by this setting.
void setIsFPConstrained(bool IsCon) { IsFPConstrained = IsCon; }		void setIsFPConstrained(bool IsCon) { IsFPConstrained = IsCon; }

/// Query for the use of constrained floating point math		/// Query for the use of constrained floating point math
bool getIsFPConstrained() { return IsFPConstrained; }		bool getIsFPConstrained() { return IsFPConstrained; }

/// Set the exception handling to be used with constrained floating point		/// Set the exception handling to be used with constrained floating point
void setDefaultConstrainedExcept(		void setDefaultConstrainedExcept(
ConstrainedFPIntrinsic::ExceptionBehavior NewExcept) {		ExceptionBehavior NewExcept) {
DefaultConstrainedExcept = NewExcept;		DefaultConstrainedExcept = NewExcept;
}		}

/// Set the rounding mode handling to be used with constrained floating point		/// Set the rounding mode handling to be used with constrained floating point
void setDefaultConstrainedRounding(		void setDefaultConstrainedRounding(
ConstrainedFPIntrinsic::RoundingMode NewRounding) {		RoundingMode NewRounding) {
DefaultConstrainedRounding = NewRounding;		DefaultConstrainedRounding = NewRounding;
}		}

/// Get the exception handling used with constrained floating point		/// Get the exception handling used with constrained floating point
ConstrainedFPIntrinsic::ExceptionBehavior getDefaultConstrainedExcept() {		ExceptionBehavior getDefaultConstrainedExcept() {
return DefaultConstrainedExcept;		return DefaultConstrainedExcept;
}		}

/// Get the rounding mode handling used with constrained floating point		/// Get the rounding mode handling used with constrained floating point
ConstrainedFPIntrinsic::RoundingMode getDefaultConstrainedRounding() {		RoundingMode getDefaultConstrainedRounding() {
return DefaultConstrainedRounding;		return DefaultConstrainedRounding;
}		}

void setConstrainedFPFunctionAttr() {		void setConstrainedFPFunctionAttr() {
assert(BB && "Must have a basic block to set any function attributes!");		assert(BB && "Must have a basic block to set any function attributes!");

Function *F = BB->getParent();		Function *F = BB->getParent();
if (!F->hasFnAttribute(Attribute::StrictFP)) {		if (!F->hasFnAttribute(Attribute::StrictFP)) {
▲ Show 20 Lines • Show All 388 Lines • ▼ Show 20 Lines	public:
/// Create a call to Masked Scatter intrinsic		/// Create a call to Masked Scatter intrinsic
CallInst CreateMaskedScatter(Value Val, Value *Ptrs, unsigned Align,		CallInst CreateMaskedScatter(Value Val, Value *Ptrs, unsigned Align,
Value *Mask = nullptr);		Value *Mask = nullptr);

/// Create an assume intrinsic call that allows the optimizer to		/// Create an assume intrinsic call that allows the optimizer to
/// assume that the provided condition will be true.		/// assume that the provided condition will be true.
CallInst CreateAssumption(Value Cond);		CallInst CreateAssumption(Value Cond);

		/// Call an arithmetic VP intrinsic.
		Instruction CreateVectorPredicatedInst(unsigned OC, ArrayRef<Value >,
		Instruction *FMFSource = nullptr,
		const Twine &Name = "");

		/// Call an comparison VP intrinsic.
		Instruction *CreateVectorPredicatedCmp(CmpInst::Predicate Pred,
		Value FirstOp, Value SndOp, Value *Mask,
		Value *VectorLength,
		const Twine &Name = "");

		/// Call an comparison VP intrinsic.
		Instruction *CreateVectorPredicatedReduce(Module &M, CmpInst::Predicate Pred,
		Value FirstOp, Value SndOp, Value *Mask,
		Value *VectorLength,
		const Twine &Name = "");

/// Create a call to the experimental.gc.statepoint intrinsic to		/// Create a call to the experimental.gc.statepoint intrinsic to
/// start a new statepoint sequence.		/// start a new statepoint sequence.
CallInst *CreateGCStatepointCall(uint64_t ID, uint32_t NumPatchBytes,		CallInst *CreateGCStatepointCall(uint64_t ID, uint32_t NumPatchBytes,
Value *ActualCallee,		Value *ActualCallee,
ArrayRef<Value *> CallArgs,		ArrayRef<Value *> CallArgs,
ArrayRef<Value *> DeoptArgs,		ArrayRef<Value *> DeoptArgs,
ArrayRef<Value *> GCArgs,		ArrayRef<Value *> GCArgs,
const Twine &Name = "");		const Twine &Name = "");
▲ Show 20 Lines • Show All 426 Lines • ▼ Show 20 Lines	private:
Value foldConstant(Instruction::BinaryOps Opc, Value L,		Value foldConstant(Instruction::BinaryOps Opc, Value L,
Value *R, const Twine &Name) const {		Value *R, const Twine &Name) const {
auto *LC = dyn_cast<Constant>(L);		auto *LC = dyn_cast<Constant>(L);
auto *RC = dyn_cast<Constant>(R);		auto *RC = dyn_cast<Constant>(R);
return (LC && RC) ? Insert(Folder.CreateBinOp(Opc, LC, RC), Name) : nullptr;		return (LC && RC) ? Insert(Folder.CreateBinOp(Opc, LC, RC), Name) : nullptr;
}		}

Value *getConstrainedFPRounding(		Value *getConstrainedFPRounding(
Optional<ConstrainedFPIntrinsic::RoundingMode> Rounding) {		Optional<RoundingMode> Rounding) {
ConstrainedFPIntrinsic::RoundingMode UseRounding =		RoundingMode UseRounding =
DefaultConstrainedRounding;		DefaultConstrainedRounding;

if (Rounding.hasValue())		if (Rounding.hasValue())
UseRounding = Rounding.getValue();		UseRounding = Rounding.getValue();

Optional<StringRef> RoundingStr =		return GetConstrainedFPRounding(Context, UseRounding);
ConstrainedFPIntrinsic::RoundingModeToStr(UseRounding);
assert(RoundingStr.hasValue() && "Garbage strict rounding mode!");
auto *RoundingMDS = MDString::get(Context, RoundingStr.getValue());

return MetadataAsValue::get(Context, RoundingMDS);
}		}

Value *getConstrainedFPExcept(		Value *getConstrainedFPExcept(
Optional<ConstrainedFPIntrinsic::ExceptionBehavior> Except) {		Optional<ExceptionBehavior> Except) {
ConstrainedFPIntrinsic::ExceptionBehavior UseExcept =		ExceptionBehavior UseExcept =
DefaultConstrainedExcept;		DefaultConstrainedExcept;

if (Except.hasValue())		if (Except.hasValue())
UseExcept = Except.getValue();		UseExcept = Except.getValue();

Optional<StringRef> ExceptStr =		return GetConstrainedFPExcept(Context, UseExcept);
ConstrainedFPIntrinsic::ExceptionBehaviorToStr(UseExcept);
assert(ExceptStr.hasValue() && "Garbage strict exception behavior!");
auto *ExceptMDS = MDString::get(Context, ExceptStr.getValue());

return MetadataAsValue::get(Context, ExceptMDS);
}		}

public:		public:
Value CreateAdd(Value LHS, Value *RHS, const Twine &Name = "",		Value CreateAdd(Value LHS, Value *RHS, const Twine &Name = "",
bool HasNUW = false, bool HasNSW = false) {		bool HasNUW = false, bool HasNSW = false) {
if (auto *LC = dyn_cast<Constant>(LHS))		if (auto *LC = dyn_cast<Constant>(LHS))
if (auto *RC = dyn_cast<Constant>(RHS))		if (auto *RC = dyn_cast<Constant>(RHS))
return Insert(Folder.CreateAdd(LC, RC, HasNUW, HasNSW), Name);		return Insert(Folder.CreateAdd(LC, RC, HasNUW, HasNSW), Name);
▲ Show 20 Lines • Show All 340 Lines • ▼ Show 20 Lines	Value *CreateBinOp(Instruction::BinaryOps Opc,
if (isa<FPMathOperator>(BinOp))		if (isa<FPMathOperator>(BinOp))
setFPAttrs(BinOp, FPMathTag, FMF);		setFPAttrs(BinOp, FPMathTag, FMF);
return Insert(BinOp, Name);		return Insert(BinOp, Name);
}		}

CallInst *CreateConstrainedFPBinOp(		CallInst *CreateConstrainedFPBinOp(
Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource = nullptr,		Intrinsic::ID ID, Value L, Value R, Instruction *FMFSource = nullptr,
const Twine &Name = "", MDNode *FPMathTag = nullptr,		const Twine &Name = "", MDNode *FPMathTag = nullptr,
Optional<ConstrainedFPIntrinsic::RoundingMode> Rounding = None,		Optional<RoundingMode> Rounding = None,
Optional<ConstrainedFPIntrinsic::ExceptionBehavior> Except = None) {		Optional<ExceptionBehavior> Except = None) {
Value *RoundingV = getConstrainedFPRounding(Rounding);		Value *RoundingV = getConstrainedFPRounding(Rounding);
Value *ExceptV = getConstrainedFPExcept(Except);		Value *ExceptV = getConstrainedFPExcept(Except);

FastMathFlags UseFMF = FMF;		FastMathFlags UseFMF = FMF;
if (FMFSource)		if (FMFSource)
UseFMF = FMFSource->getFastMathFlags();		UseFMF = FMFSource->getFastMathFlags();

CallInst *C = CreateIntrinsic(ID, {L->getType()},		CallInst *C = CreateIntrinsic(ID, {L->getType()},
▲ Show 20 Lines • Show All 577 Lines • ▼ Show 20 Lines	if (auto *VC = dyn_cast<Constant>(V))
return Insert(Folder.CreateFPCast(VC, DestTy), Name);		return Insert(Folder.CreateFPCast(VC, DestTy), Name);
return Insert(CastInst::CreateFPCast(V, DestTy), Name);		return Insert(CastInst::CreateFPCast(V, DestTy), Name);
}		}

CallInst *CreateConstrainedFPCast(		CallInst *CreateConstrainedFPCast(
Intrinsic::ID ID, Value V, Type DestTy,		Intrinsic::ID ID, Value V, Type DestTy,
Instruction *FMFSource = nullptr, const Twine &Name = "",		Instruction *FMFSource = nullptr, const Twine &Name = "",
MDNode *FPMathTag = nullptr,		MDNode *FPMathTag = nullptr,
Optional<ConstrainedFPIntrinsic::RoundingMode> Rounding = None,		Optional<RoundingMode> Rounding = None,
Optional<ConstrainedFPIntrinsic::ExceptionBehavior> Except = None) {		Optional<ExceptionBehavior> Except = None) {
Value *ExceptV = getConstrainedFPExcept(Except);		Value *ExceptV = getConstrainedFPExcept(Except);

FastMathFlags UseFMF = FMF;		FastMathFlags UseFMF = FMF;
if (FMFSource)		if (FMFSource)
UseFMF = FMFSource->getFastMathFlags();		UseFMF = FMFSource->getFastMathFlags();

CallInst *C;		CallInst *C;
switch (ID) {		switch (ID) {
▲ Show 20 Lines • Show All 607 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::dbg_label;		return I->getIntrinsicID() == Intrinsic::dbg_label;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
/// @}		/// @}
};		};

/// This is the common base class for constrained floating point intrinsics.		enum class RoundingMode : uint8_t {
class ConstrainedFPIntrinsic : public IntrinsicInst {		rmInvalid,
public:		rmDynamic,
/// Specifies the rounding mode to be assumed. This is only used when		rmToNearest,
/// when constrained floating point is enabled. See the LLVM Language		rmDownward,
/// Reference Manual for details.		rmUpward,
enum RoundingMode : uint8_t {		rmTowardZero
rmDynamic, ///< This corresponds to "fpround.dynamic".
rmToNearest, ///< This corresponds to "fpround.tonearest".
rmDownward, ///< This corresponds to "fpround.downward".
rmUpward, ///< This corresponds to "fpround.upward".
rmTowardZero ///< This corresponds to "fpround.tozero".
};

/// Specifies the required exception behavior. This is only used when
/// when constrained floating point is used. See the LLVM Language
/// Reference Manual for details.
enum ExceptionBehavior : uint8_t {
ebIgnore, ///< This corresponds to "fpexcept.ignore".
ebMayTrap, ///< This corresponds to "fpexcept.maytrap".
ebStrict ///< This corresponds to "fpexcept.strict".
};		};

bool isUnaryOp() const;		enum class ExceptionBehavior : uint8_t {
bool isTernaryOp() const;		ebInvalid,
Optional<RoundingMode> getRoundingMode() const;		ebIgnore,
Optional<ExceptionBehavior> getExceptionBehavior() const;		ebMayTrap,
		ebStrict
		};

/// Returns a valid RoundingMode enumerator when given a string		/// Returns a valid RoundingMode enumerator when given a string
/// that is valid as input in constrained intrinsic rounding mode		/// that is valid as input in constrained intrinsic rounding mode
/// metadata.		/// metadata.
static Optional<RoundingMode> StrToRoundingMode(StringRef);		Optional<RoundingMode> StrToRoundingMode(StringRef);

/// For any RoundingMode enumerator, returns a string valid as input in		/// For any RoundingMode enumerator, returns a string valid as input in
/// constrained intrinsic rounding mode metadata.		/// constrained intrinsic rounding mode metadata.
static Optional<StringRef> RoundingModeToStr(RoundingMode);		Optional<StringRef> RoundingModeToStr(RoundingMode);

/// Returns a valid ExceptionBehavior enumerator when given a string		/// Returns a valid ExceptionBehavior enumerator when given a string
/// valid as input in constrained intrinsic exception behavior metadata.		/// valid as input in constrained intrinsic exception behavior metadata.
static Optional<ExceptionBehavior> StrToExceptionBehavior(StringRef);		Optional<ExceptionBehavior> StrToExceptionBehavior(StringRef);

/// For any ExceptionBehavior enumerator, returns a string valid as		/// For any ExceptionBehavior enumerator, returns a string valid as
/// input in constrained intrinsic exception behavior metadata.		/// input in constrained intrinsic exception behavior metadata.
static Optional<StringRef> ExceptionBehaviorToStr(ExceptionBehavior);		Optional<StringRef> ExceptionBehaviorToStr(ExceptionBehavior);

		/// Return the IR Value representation of any ExceptionBehavior.
		Value*
		GetConstrainedFPExcept(LLVMContext&, ExceptionBehavior);

		/// Return the IR Value representation of any RoundingMode.
		Value*
		GetConstrainedFPRounding(LLVMContext&, RoundingMode);

		/// This is the common base class for vector predication intrinsics.
		class VPIntrinsic : public IntrinsicInst {
		public:
		enum class VPTypeToken : int8_t {
		Returned = 0, // vectorized return type.
		Vector = 1, // vector operand type
		Pointer = 2, // vector pointer-operand type (memory op)
		Mask = 3 // vector mask type
		};

		using TypeTokenVec = SmallVector<VPTypeToken, 4>;
		using ShortTypeVec = SmallVector<Type *, 4>;

		/// \brief Declares a llvm.vp.* intrinsic in \p M that matches the parameters \p Params.
		static Function* GetDeclarationForParams(Module M, Intrinsic::ID, ArrayRef<Value > Params);

		// Type tokens required to instantiate this intrinsic.
		static TypeTokenVec GetTypeTokens(Intrinsic::ID);

		// whether the intrinsic has a rounding mode parameter (regardless of
		// setting).
		static bool HasRoundingModeParam(Intrinsic::ID VPID) { return GetRoundingModeParamPos(VPID) != None; }
		// whether the intrinsic has a exception behavior parameter (regardless of
		// setting).
		static bool HasExceptionBehaviorParam(Intrinsic::ID VPID) { return GetExceptionBehaviorParamPos(VPID) != None; }
		static Optional<int> GetMaskParamPos(Intrinsic::ID IntrinsicID);
		static Optional<int> GetVectorLengthParamPos(Intrinsic::ID IntrinsicID);
		static Optional<int>
		GetExceptionBehaviorParamPos(Intrinsic::ID IntrinsicID);
		static Optional<int> GetRoundingModeParamPos(Intrinsic::ID IntrinsicID);
		// the llvm.vp.* intrinsic for this llvm.experimental.constrained.*
		// intrinsic
		static Intrinsic::ID GetForConstrainedIntrinsic(Intrinsic::ID IntrinsicID);
		static Intrinsic::ID GetForOpcode(unsigned OC);

		/// TODO make this private!
		/// \brief Generate the disambiguating type vec for this VP Intrinsic.
		/// \returns A disamguating type vector to instantiate this intrinsic.
		/// \p TTVec
		/// Vector of disambiguating tokens.
		/// \p VecRetTy
		/// The return type of the intrinsic (optional)
		/// \p VecPtrTy
		/// The pointer operand type (optional)
		/// \p VectorTy
		/// The vector data type of the operation.
		static VPIntrinsic::ShortTypeVec
		EncodeTypeTokens(VPIntrinsic::TypeTokenVec TTVec, Type *VecRetTy,
		Type *VecPtrTy, Type &VectorTy);

		/// set the mask parameter.
		/// this asserts if the underlying intrinsic has no mask parameter.
		void setMaskParam(Value *);

		/// set the vector length parameter.
		/// this asserts if the underlying intrinsic has no vector length
		/// parameter.
		void setVectorLengthParam(Value *);

		/// \return the mask parameter or nullptr.
		Value *getMaskParam() const;

		/// \return the vector length parameter or nullptr.
		Value *getVectorLengthParam() const;

		/// \return whether the vector length param can be ignored.
		bool canIgnoreVectorLengthParam() const;

		/// \return The pointer operand of this load,store, gather or scatter.
		Value *getMemoryPointerParam() const;
		static Optional<int> GetMemoryPointerParamPos(Intrinsic::ID);

		/// \return The data (payload) operand of this store or scatter.
		Value *getMemoryDataParam() const;
		static Optional<int> GetMemoryDataParamPos(Intrinsic::ID);

		/// \return The vector to reduce if this is a reduction operation.
		Value *getReductionVectorParam() const;
		static Optional<int> GetReductionVectorParamPos(Intrinsic::ID VPID);

		/// \return The initial value of this is a reduction operation.
		Value *getReductionAccuParam() const;
		static Optional<int> GetReductionAccuParamPos(Intrinsic::ID VPID);

		/// \return the static element count (vector number of elements) the vector
		/// length parameter applies to. This returns None if the operation is
		/// scalable.
		Optional<int32_t> getStaticVectorLength() const;

		bool isUnaryOp() const;
		static bool IsUnaryVPOp(Intrinsic::ID);
		bool isBinaryOp() const;
		static bool IsBinaryVPOp(Intrinsic::ID);
		bool isTernaryOp() const;
		static bool IsTernaryVPOp(Intrinsic::ID);

		// compare intrinsic
		bool isCompareOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::vp_icmp:
		case Intrinsic::vp_fcmp:
		return true;
		}
		}
		CmpInst::Predicate getCmpPredicate() const;

		// Contrained fp-math
		// whether this is an fp op with non-standard rounding or exception
		// behavior.
		bool isConstrainedOp() const;

		// the specified rounding mode.
		Optional<RoundingMode> getRoundingMode() const;
		// the specified exception behavior.
		Optional<ExceptionBehavior> getExceptionBehavior() const;

		// llvm.vp.reduction.*
		bool isReductionOp() const;
		static bool IsVPReduction(Intrinsic::ID VPIntrin);

		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		switch (I->getIntrinsicID()) {
		default:
		return false;

		// general cmp
		case Intrinsic::vp_icmp:
		case Intrinsic::vp_fcmp:

		// int arith
		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:
		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		// memory
		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		// shuffle
		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		// fp arith
		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:
		case Intrinsic::vp_ceil:
		case Intrinsic::vp_cos:
		case Intrinsic::vp_exp2:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log:
		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		case Intrinsic::vp_nearbyint:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_round:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_trunc:
		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:

		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:

		// reductions
		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		return true;
		}
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}

		Intrinsic::ID getFunctionalIntrinsicID() const {
		return GetFunctionalIntrinsicForVP(getIntrinsicID());
		}

		static Intrinsic::ID GetFunctionalIntrinsicForVP(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return VPID;

		case Intrinsic::vp_reduce_add:
		return Intrinsic::experimental_vector_reduce_add;
		case Intrinsic::vp_reduce_mul:
		return Intrinsic::experimental_vector_reduce_mul;
		case Intrinsic::vp_reduce_and:
		return Intrinsic::experimental_vector_reduce_and;
		case Intrinsic::vp_reduce_or:
		return Intrinsic::experimental_vector_reduce_or;
		case Intrinsic::vp_reduce_xor:
		return Intrinsic::experimental_vector_reduce_xor;
		case Intrinsic::vp_reduce_smin:
		return Intrinsic::experimental_vector_reduce_smin;
		case Intrinsic::vp_reduce_smax:
		return Intrinsic::experimental_vector_reduce_smax;
		case Intrinsic::vp_reduce_umin:
		return Intrinsic::experimental_vector_reduce_umin;
		case Intrinsic::vp_reduce_umax:
		return Intrinsic::experimental_vector_reduce_umax;

		case Intrinsic::vp_reduce_fmin:
		return Intrinsic::experimental_vector_reduce_fmin;
		case Intrinsic::vp_reduce_fmax:
		return Intrinsic::experimental_vector_reduce_fmax;

		case Intrinsic::vp_reduce_fadd:
		return Intrinsic::experimental_vector_reduce_v2_fadd;
		case Intrinsic::vp_reduce_fmul:
		return Intrinsic::experimental_vector_reduce_v2_fmul;
		}
		}

		// Equivalent non-predicated opcode
		unsigned getFunctionalOpcode() const {
		if (isConstrainedOp()) {
		return Instruction::Call; // TODO pass as constrained op
		}
		return GetFunctionalOpcodeForVP(getIntrinsicID());
		}

		// Equivalent non-predicated opcode
		static unsigned GetFunctionalOpcodeForVP(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return Instruction::Call;

		case Intrinsic::vp_icmp:
		return Instruction::ICmp;
		case Intrinsic::vp_fcmp:
		return Instruction::FCmp;

		case Intrinsic::vp_and:
		return Instruction::And;
		case Intrinsic::vp_or:
		return Instruction::Or;
		case Intrinsic::vp_xor:
		return Instruction::Xor;
		case Intrinsic::vp_ashr:
		return Instruction::AShr;
		case Intrinsic::vp_lshr:
		return Instruction::LShr;
		case Intrinsic::vp_shl:
		return Instruction::Shl;

		case Intrinsic::vp_select:
		return Instruction::Select;

		case Intrinsic::vp_load:
		return Instruction::Load;
		case Intrinsic::vp_store:
		return Instruction::Store;

		case Intrinsic::vp_fneg:
		return Instruction::FNeg;

		case Intrinsic::vp_fadd:
		return Instruction::FAdd;
		case Intrinsic::vp_fsub:
		return Instruction::FSub;
		case Intrinsic::vp_fmul:
		return Instruction::FMul;
		case Intrinsic::vp_fdiv:
		return Instruction::FDiv;
		case Intrinsic::vp_frem:
		return Instruction::FRem;

		case Intrinsic::vp_add:
		return Instruction::Add;
		case Intrinsic::vp_sub:
		return Instruction::Sub;
		case Intrinsic::vp_mul:
		return Instruction::Mul;
		case Intrinsic::vp_udiv:
		return Instruction::UDiv;
		case Intrinsic::vp_sdiv:
		return Instruction::SDiv;
		case Intrinsic::vp_urem:
		return Instruction::URem;
		case Intrinsic::vp_srem:
		return Instruction::SRem;
		}
		}
		};

		/// This is the common base class for constrained floating point intrinsics.
		class ConstrainedFPIntrinsic : public IntrinsicInst {
		public:

		bool isUnaryOp() const;
		bool isTernaryOp() const;
		Optional<RoundingMode> getRoundingMode() const;
		Optional<ExceptionBehavior> getExceptionBehavior() const;

// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
case Intrinsic::experimental_constrained_fadd:		case Intrinsic::experimental_constrained_fadd:
case Intrinsic::experimental_constrained_fsub:		case Intrinsic::experimental_constrained_fsub:
case Intrinsic::experimental_constrained_fmul:		case Intrinsic::experimental_constrained_fmul:
case Intrinsic::experimental_constrained_fdiv:		case Intrinsic::experimental_constrained_fdiv:
▲ Show 20 Lines • Show All 634 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines
}		}

// ReadNone - The specified argument pointer is not dereferenced by the		// ReadNone - The specified argument pointer is not dereferenced by the
// intrinsic.		// intrinsic.
class ReadNone<int argNo> : IntrinsicProperty {		class ReadNone<int argNo> : IntrinsicProperty {
int ArgNo = argNo;		int ArgNo = argNo;
}		}

		// VectorLength - The specified argument is the Dynamic Vector Length of the
		// operation.
		class VectorLength<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}

		// Mask - The specified argument contains the per-lane mask of this
		// intrinsic. Inputs on masked-out lanes must not affect the result of this
		// intrinsic (except for the Passthru argument).
		class Mask<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}
		// Passthru - The specified argument contains the per-lane return value
		// for this vector intrinsic where the mask is false.
		// (requires the Mask attribute in the same function)
		class Passthru<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}

def IntrNoReturn : IntrinsicProperty;		def IntrNoReturn : IntrinsicProperty;

def IntrWillReturn : IntrinsicProperty;		def IntrWillReturn : IntrinsicProperty;

// IntrCold - Calls to this intrinsic are cold.		// IntrCold - Calls to this intrinsic are cold.
// Parallels the cold attribute on LLVM IR functions.		// Parallels the cold attribute on LLVM IR functions.
def IntrCold : IntrinsicProperty;		def IntrCold : IntrinsicProperty;

▲ Show 20 Lines • Show All 985 Lines • ▼ Show 20 Lines

// Intrinsic to detect whether its argument is a constant.		// Intrinsic to detect whether its argument is a constant.
def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem, IntrWillReturn], "llvm.is.constant">;		def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem, IntrWillReturn], "llvm.is.constant">;

// Intrinsic to mask out bits of a pointer.		// Intrinsic to mask out bits of a pointer.
def int_ptrmask: Intrinsic<[llvm_anyptr_ty], [llvm_anyptr_ty, llvm_anyint_ty],		def int_ptrmask: Intrinsic<[llvm_anyptr_ty], [llvm_anyptr_ty, llvm_anyint_ty],
[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;		[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;

		//===---------------- Vector Predication Intrinsics --------------===//

		// Memory Intrinsics
		def int_vp_store : Intrinsic<[],
		[ llvm_anyvector_ty,
		LLVMAnyPointerType<LLVMMatchType<0>>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ NoCapture<1>, IntrArgMemOnly, IntrWillReturn, Mask<2>, VectorLength<3> ]>;

		def int_vp_load : Intrinsic<[ llvm_anyvector_ty],
		[ LLVMAnyPointerType<LLVMMatchType<0>>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ NoCapture<0>, IntrReadMem, IntrWillReturn, IntrArgMemOnly, Mask<1>, VectorLength<2> ]>;

		def int_vp_gather: Intrinsic<[ llvm_anyvector_ty],
		[ LLVMVectorOfAnyPointersToElt<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrReadMem, IntrWillReturn, IntrArgMemOnly, Mask<1>, VectorLength<2> ]>;

		def int_vp_scatter: Intrinsic<[],
		[ llvm_anyvector_ty,
		LLVMVectorOfAnyPointersToElt<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrArgMemOnly, IntrWillReturn, Mask<2>, VectorLength<3> ]>;
		// TODO allow IntrNoCapture for vectors of pointers

		// Reductions
		let IntrProperties = [IntrNoMem, IntrWillReturn, Mask<1>, VectorLength<2>] in {
		def int_vp_reduce_add : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_mul : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_and : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_or : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_xor : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smax : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smin : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umax : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umin : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmax : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmin : Intrinsic<[LLVMVectorElementType<0>],
		[llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

		let IntrProperties = [IntrNoMem, IntrWillReturn, Mask<2>, VectorLength<3>] in {
		def int_vp_reduce_fadd : Intrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmul : Intrinsic<[LLVMVectorElementType<0>],
		[LLVMVectorElementType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

		// Binary operators
		let IntrProperties = [IntrNoMem, IntrWillReturn, Mask<2>, VectorLength<3>] in {
		def int_vp_add : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_sub : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_mul : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_sdiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_udiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_srem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_urem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		// Logical operators
		def int_vp_ashr : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_lshr : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_shl : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_or : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_and : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_xor : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		}

		// Comparison
		// TODO add signalling fcmp
		// The last argument is the comparison predicate
		def int_vp_icmp : Intrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],
		[ llvm_anyvector_ty,
		LLVMMatchType<0>,
		llvm_i8_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ],
		[ IntrNoMem, Mask<3>, VectorLength<4>, ImmArg<2> ]>;

		def int_vp_fcmp : Intrinsic<[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty> ],
		[ llvm_anyvector_ty,
		LLVMMatchType<0>,
		llvm_i8_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ],
		[ IntrNoMem, Mask<3>, VectorLength<4>, ImmArg<2> ]>;



		// Shuffle
		def int_vp_vshift: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_i32_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, Mask<2>, VectorLength<3> ]>;

		def int_vp_expand: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, Mask<1>, VectorLength<2> ]>;

		def int_vp_compress: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, VectorLength<2> ]>;

		// Select
		def int_vp_select : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, Passthru<2>, Mask<0>, VectorLength<3> ]>;

		// Compose
		def int_vp_compose : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_i32_ty,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, VectorLength<3> ]>;



		// VP fp rounding and truncation
		let IntrProperties = [ IntrNoMem, IntrWillReturn, Mask<2>, VectorLength<3> ] in {

		def int_vp_fptosi : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		def int_vp_fptoui : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		def int_vp_fpext : Intrinsic<[ llvm_anyfloat_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		def int_vp_lround : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_llround : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

		let IntrProperties = [ IntrNoMem, IntrWillReturn, Mask<3>, VectorLength<4> ] in {
		def int_vp_fptrunc : Intrinsic<[ llvm_anyfloat_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		}

		// VP single argument constrained intrinsics.
		let IntrProperties = [ IntrNoMem, IntrWillReturn, Mask<3>, VectorLength<4> ] in {
		// These intrinsics are sensitive to the rounding mode so we need constrained
		// versions of each of them. When strict rounding and exception control are
		// not required the non-constrained versions of these intrinsics should be
		// used.
		def int_vp_sqrt : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_sin : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_cos : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_log : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_log10: Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_log2 : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_exp : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_exp2 : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_rint : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_nearbyint : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_lrint : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_llrint : Intrinsic<[ llvm_anyint_ty ],
		[ llvm_anyfloat_ty,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_ceil : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_floor : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_round : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_trunc : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		}


		// VP two argument constrained intrinsics.
		let IntrProperties = [ IntrNoMem, IntrWillReturn, Mask<4>, VectorLength<5> ] in {
		// These intrinsics are sensitive to the rounding mode so we need constrained
		// versions of each of them. When strict rounding and exception control are
		// not required the non-constrained versions of these intrinsics should be
		// used.
		def int_vp_powi : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		llvm_i32_ty,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_pow : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_maxnum : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_minnum : Intrinsic<[ llvm_anyfloat_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		}


		// VP standard fp-math intrinsics.
		def int_vp_fneg : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, IntrWillReturn, Mask<2>, VectorLength<3> ]>;

		let IntrProperties = [ IntrNoMem, IntrWillReturn, Mask<4>, VectorLength<5> ] in {
		// These intrinsics are sensitive to the rounding mode so we need constrained
		// versions of each of them. When strict rounding and exception control are
		// not required the non-constrained versions of these intrinsics should be
		// used.
		def int_vp_fadd : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_fsub : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_fmul : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_fdiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_frem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		}

		def int_vp_fma : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ],
		[ IntrNoMem, IntrWillReturn, Mask<5>, VectorLength<6> ]>;




//===-------------------------- Masked Intrinsics -------------------------===//		//===-------------------------- Masked Intrinsics -------------------------===//
//		// TODO poised for deprecation (to be superseded by llvm.vp.* intrinsics)
def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,		def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,
LLVMAnyPointerType<LLVMMatchType<0>>,		LLVMAnyPointerType<LLVMMatchType<0>>,
llvm_i32_ty,		llvm_i32_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
[IntrArgMemOnly, IntrWillReturn, ImmArg<2>]>;		[IntrArgMemOnly, IntrWillReturn, ImmArg<2>]>;

def int_masked_load : Intrinsic<[llvm_anyvector_ty],		def int_masked_load : Intrinsic<[llvm_anyvector_ty],
[LLVMAnyPointerType<LLVMMatchType<0>>, llvm_i32_ty,		[LLVMAnyPointerType<LLVMMatchType<0>>, llvm_i32_ty,
▲ Show 20 Lines • Show All 83 Lines • ▼ Show 20 Lines	: Intrinsic<[],
]>;		]>;

// @llvm.memset.element.unordered.atomic.*(dest, value, length, elementsize)		// @llvm.memset.element.unordered.atomic.*(dest, value, length, elementsize)
def int_memset_element_unordered_atomic		def int_memset_element_unordered_atomic
: Intrinsic<[], [ llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty ],		: Intrinsic<[], [ llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty ],
[ IntrArgMemOnly, IntrWillReturn, NoCapture<0>, WriteOnly<0>, ImmArg<3> ]>;		[ IntrArgMemOnly, IntrWillReturn, NoCapture<0>, WriteOnly<0>, ImmArg<3> ]>;

//===------------------------ Reduction Intrinsics ------------------------===//		//===------------------------ Reduction Intrinsics ------------------------===//
		// TODO poised for deprecation (to be superseded by llvm.vp.*. intrinsics)
//		//
let IntrProperties = [IntrNoMem, IntrWillReturn] in {		let IntrProperties = [IntrNoMem, IntrWillReturn] in {
def int_experimental_vector_reduce_v2_fadd : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_v2_fadd : Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,		[LLVMMatchType<0>,
llvm_anyvector_ty]>;		llvm_anyvector_ty]>;
def int_experimental_vector_reduce_v2_fmul : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_v2_fmul : Intrinsic<[llvm_anyfloat_ty],
[LLVMMatchType<0>,		[LLVMMatchType<0>,
llvm_anyvector_ty]>;		llvm_anyvector_ty]>;
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

llvm/include/llvm/IR/MatcherCast.h

This file was added.

				#ifndef LLVM_IR_MATCHERCAST_H
				#define LLVM_IR_MATCHERCAST_H

				//===- MatcherCast.h - Match on the LLVM IR --------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Parameterized class hierachy for templatized pattern matching.
				//
				//===----------------------------------------------------------------------===//


				namespace llvm {
				namespace PatternMatch {


				// type modification
				template<typename Matcher, typename DestClass>
				struct MatcherCast { };

				// whether the Value \p Obj behaves like a \p Class.
				template<typename MatcherClass, typename Class>
				bool match_isa(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return isa<const DestClass>(Obj);
				}

				template<typename MatcherClass, typename Class>
				auto match_cast(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return cast<const DestClass>(Obj);
				}
				template<typename MatcherClass, typename Class>
				auto match_dyn_cast(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return dyn_cast<const DestClass>(Obj);
				}

				template<typename MatcherClass, typename Class>
				auto match_cast(Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return cast<DestClass>(Obj);
				}
				template<typename MatcherClass, typename Class>
				auto match_dyn_cast(Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return dyn_cast<DestClass>(Obj);
				}


				} // namespace PatternMatch

				} // namespace llvm

				#endif // LLVM_IR_MATCHERCAST_H

llvm/include/llvm/IR/PatternMatch.h

Show All 34 Lines
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		#include "llvm/IR/MatcherCast.h"

#include <cstdint>		#include <cstdint>


namespace llvm {		namespace llvm {
namespace PatternMatch {		namespace PatternMatch {

		// Use verbatim types in default (empty) context.
		struct EmptyContext {
		EmptyContext() {}

		EmptyContext(const Value *) {}

		EmptyContext(const EmptyContext & E) {}

		// reset this match context to be rooted at \p V
		void reset(Value * V) {}

		// accept a match where \p Val is in a non-leaf position in a match pattern
		bool acceptInnerNode(const Value * Val) const { return true; }

		// accept a match where \p Val is bound to a free variable.
		bool acceptBoundNode(const Value * Val) const { return true; }

		// whether this context is compatiable with \p E.
		bool acceptContext(EmptyContext E) const { return true; }

		// merge the context \p E into this context and return whether the resulting context is valid.
		bool mergeContext(EmptyContext E) { return true; }

		// reset this context to \p Val.
		template <typename Val, typename Pattern> bool reset_match(Val *V, const Pattern &P) {
		reset(V);
		return const_cast<Pattern &>(P).match_context(V, *this);
		}

		// match in the current context
		template <typename Val, typename Pattern> bool try_match(Val *V, const Pattern &P) {
		return const_cast<Pattern &>(P).match_context(V, *this);
		}
		};

		template<typename DestClass>
		struct MatcherCast<EmptyContext, DestClass> { using ActualCastType = DestClass; };






		// match without (== empty) context
template <typename Val, typename Pattern> bool match(Val *V, const Pattern &P) {		template <typename Val, typename Pattern> bool match(Val *V, const Pattern &P) {
return const_cast<Pattern &>(P).match(V);		EmptyContext ECtx;
		return const_cast<Pattern &>(P).match_context(V, ECtx);
}		}

		// match pattern in a given context
		template <typename Val, typename Pattern, typename MatchContext> bool match(Val *V, const Pattern &P, MatchContext & MContext) {
		return const_cast<Pattern &>(P).match_context(V, MContext);
		}



template <typename SubPattern_t> struct OneUse_match {		template <typename SubPattern_t> struct OneUse_match {
SubPattern_t SubPattern;		SubPattern_t SubPattern;

OneUse_match(const SubPattern_t &SP) : SubPattern(SP) {}		OneUse_match(const SubPattern_t &SP) : SubPattern(SP) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) {
return V->hasOneUse() && SubPattern.match(V);		EmptyContext EContext; return match_context(V, EContext);
		}

		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		return V->hasOneUse() && SubPattern.match_context(V, MContext);
}		}
};		};

template <typename T> inline OneUse_match<T> m_OneUse(const T &SubPattern) {		template <typename T> inline OneUse_match<T> m_OneUse(const T &SubPattern) {
return SubPattern;		return SubPattern;
}		}

template <typename Class> struct class_match {		template <typename Class> struct class_match {
template <typename ITy> bool match(ITy *V) { return isa<Class>(V); }		template <typename ITy> bool match(ITy *V) {
		EmptyContext EContext; return match_context<ITy, EmptyContext>(V, EContext);
		}
		template <typename ITy, typename MatchContext>
		bool match_context(ITy *V, MatchContext & MContext) { return match_isa<MatchContext, Class>(V); }
};		};

/// Match an arbitrary value and ignore it.		/// Match an arbitrary value and ignore it.
inline class_match<Value> m_Value() { return class_match<Value>(); }		inline class_match<Value> m_Value() { return class_match<Value>(); }

/// Match an arbitrary binary operation and ignore it.		/// Match an arbitrary binary operation and ignore it.
inline class_match<BinaryOperator> m_BinOp() {		inline class_match<BinaryOperator> m_BinOp() {
return class_match<BinaryOperator>();		return class_match<BinaryOperator>();
Show All 34 Lines

/// Matching combinators		/// Matching combinators
template <typename LTy, typename RTy> struct match_combine_or {		template <typename LTy, typename RTy> struct match_combine_or {
LTy L;		LTy L;
RTy R;		RTy R;

match_combine_or(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}		match_combine_or(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (L.match(V))		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
		MatchContext SubContext;

		if (L.match_context(V, SubContext) && MContext.acceptContext(SubContext)) {
		MContext.mergeContext(SubContext);
return true;		return true;
if (R.match(V))		}
		if (R.match_context(V, MContext)) {
return true;		return true;
		}
return false;		return false;
}		}
};		};

template <typename LTy, typename RTy> struct match_combine_and {		template <typename LTy, typename RTy> struct match_combine_and {
LTy L;		LTy L;
RTy R;		RTy R;

match_combine_and(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}		match_combine_and(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (L.match(V))		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (R.match(V))		if (L.match_context(V, MContext))
		if (R.match_context(V, MContext))
return true;		return true;
return false;		return false;
}		}
};		};

/// Combine two pattern matchers matching L \|\| R		/// Combine two pattern matchers matching L \|\| R
template <typename LTy, typename RTy>		template <typename LTy, typename RTy>
inline match_combine_or<LTy, RTy> m_CombineOr(const LTy &L, const RTy &R) {		inline match_combine_or<LTy, RTy> m_CombineOr(const LTy &L, const RTy &R) {
return match_combine_or<LTy, RTy>(L, R);		return match_combine_or<LTy, RTy>(L, R);
}		}

/// Combine two pattern matchers matching L && R		/// Combine two pattern matchers matching L && R
template <typename LTy, typename RTy>		template <typename LTy, typename RTy>
inline match_combine_and<LTy, RTy> m_CombineAnd(const LTy &L, const RTy &R) {		inline match_combine_and<LTy, RTy> m_CombineAnd(const LTy &L, const RTy &R) {
return match_combine_and<LTy, RTy>(L, R);		return match_combine_and<LTy, RTy>(L, R);
}		}

struct apint_match {		struct apint_match {
const APInt *&Res;		const APInt *&Res;

apint_match(const APInt *&R) : Res(R) {}		apint_match(const APInt *&R) : Res(R) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CI = dyn_cast<ConstantInt>(V)) {		if (auto *CI = dyn_cast<ConstantInt>(V)) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue())) {		if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
return false;		return false;
}		}
};		};
// Either constexpr if or renaming ConstantFP::getValueAPF to		// Either constexpr if or renaming ConstantFP::getValueAPF to
// ConstantFP::getValue is needed to do it via single template		// ConstantFP::getValue is needed to do it via single template
// function for both apint/apfloat.		// function for both apint/apfloat.
struct apfloat_match {		struct apfloat_match {
const APFloat *&Res;		const APFloat *&Res;
apfloat_match(const APFloat *&R) : Res(R) {}		apfloat_match(const APFloat *&R) : Res(R) {}
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CI = dyn_cast<ConstantFP>(V)) {		if (auto *CI = dyn_cast<ConstantFP>(V)) {
Res = &CI->getValueAPF();		Res = &CI->getValueAPF();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantFP>(C->getSplatValue())) {		if (auto *CI = dyn_cast_or_null<ConstantFP>(C->getSplatValue())) {
Res = &CI->getValueAPF();		Res = &CI->getValueAPF();
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a ConstantInt or splatted ConstantVector, binding the		/// Match a ConstantInt or splatted ConstantVector, binding the
/// specified pointer to the contained APInt.		/// specified pointer to the contained APInt.
inline apint_match m_APInt(const APInt *&Res) { return Res; }		inline apint_match m_APInt(const APInt *&Res) { return Res; }

/// Match a ConstantFP or splatted ConstantVector, binding the		/// Match a ConstantFP or splatted ConstantVector, binding the
/// specified pointer to the contained APFloat.		/// specified pointer to the contained APFloat.
inline apfloat_match m_APFloat(const APFloat *&Res) { return Res; }		inline apfloat_match m_APFloat(const APFloat *&Res) { return Res; }

template <int64_t Val> struct constantint_match {		template <int64_t Val> struct constantint_match {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V)) {		if (const auto *CI = dyn_cast<ConstantInt>(V)) {
const APInt &CIV = CI->getValue();		const APInt &CIV = CI->getValue();
if (Val >= 0)		if (Val >= 0)
return CIV == static_cast<uint64_t>(Val);		return CIV == static_cast<uint64_t>(Val);
// If Val is negative, and CI is shorter than it, truncate to the right		// If Val is negative, and CI is shorter than it, truncate to the right
// number of bits. If it is larger, then we have to sign extend. Just		// number of bits. If it is larger, then we have to sign extend. Just
// compare their negated values.		// compare their negated values.
return -CIV == -Val;		return -CIV == -Val;
}		}
return false;		return false;
}		}
};		};

/// Match a ConstantInt with a specific value.		/// Match a ConstantInt with a specific value.
template <int64_t Val> inline constantint_match<Val> m_ConstantInt() {		template <int64_t Val> inline constantint_match<Val> m_ConstantInt() {
return constantint_match<Val>();		return constantint_match<Val>();
}		}

/// This helper class is used to match scalar and vector integer constants that		/// This helper class is used to match scalar and vector integer constants that
/// satisfy a specified predicate.		/// satisfy a specified predicate.
/// For vector constants, undefined elements are ignored.		/// For vector constants, undefined elements are ignored.
template <typename Predicate> struct cst_pred_ty : public Predicate {		template <typename Predicate> struct cst_pred_ty : public Predicate {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V))		if (const auto *CI = dyn_cast<ConstantInt>(V))
return this->isValue(CI->getValue());		return this->isValue(CI->getValue());
if (V->getType()->isVectorTy()) {		if (V->getType()->isVectorTy()) {
if (const auto *C = dyn_cast<Constant>(V)) {		if (const auto *C = dyn_cast<Constant>(V)) {
if (const auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))		if (const auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))
return this->isValue(CI->getValue());		return this->isValue(CI->getValue());

// Non-splat vector constant: check each element for a match.		// Non-splat vector constant: check each element for a match.
Show All 20 Lines

/// This helper class is used to match scalar and vector constants that		/// This helper class is used to match scalar and vector constants that
/// satisfy a specified predicate, and bind them to an APInt.		/// satisfy a specified predicate, and bind them to an APInt.
template <typename Predicate> struct api_pred_ty : public Predicate {		template <typename Predicate> struct api_pred_ty : public Predicate {
const APInt *&Res;		const APInt *&Res;

api_pred_ty(const APInt *&R) : Res(R) {}		api_pred_ty(const APInt *&R) : Res(R) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V))		if (const auto *CI = dyn_cast<ConstantInt>(V))
if (this->isValue(CI->getValue())) {		if (this->isValue(CI->getValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))		if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))
if (this->isValue(CI->getValue())) {		if (this->isValue(CI->getValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}

return false;		return false;
}		}
};		};

/// This helper class is used to match scalar and vector floating-point		/// This helper class is used to match scalar and vector floating-point
/// constants that satisfy a specified predicate.		/// constants that satisfy a specified predicate.
/// For vector constants, undefined elements are ignored.		/// For vector constants, undefined elements are ignored.
template <typename Predicate> struct cstfp_pred_ty : public Predicate {		template <typename Predicate> struct cstfp_pred_ty : public Predicate {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CF = dyn_cast<ConstantFP>(V))		if (const auto *CF = dyn_cast<ConstantFP>(V))
return this->isValue(CF->getValueAPF());		return this->isValue(CF->getValueAPF());
if (V->getType()->isVectorTy()) {		if (V->getType()->isVectorTy()) {
if (const auto *C = dyn_cast<Constant>(V)) {		if (const auto *C = dyn_cast<Constant>(V)) {
if (const auto *CF = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))		if (const auto *CF = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))
return this->isValue(CF->getValueAPF());		return this->isValue(CF->getValueAPF());

// Non-splat vector constant: check each element for a match.		// Non-splat vector constant: check each element for a match.
▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
};		};
/// Match an integer 0 or a vector with all elements equal to 0.		/// Match an integer 0 or a vector with all elements equal to 0.
/// For vectors, this includes constants with undefined elements.		/// For vectors, this includes constants with undefined elements.
inline cst_pred_ty<is_zero_int> m_ZeroInt() {		inline cst_pred_ty<is_zero_int> m_ZeroInt() {
return cst_pred_ty<is_zero_int>();		return cst_pred_ty<is_zero_int>();
}		}

struct is_zero {		struct is_zero {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
auto *C = dyn_cast<Constant>(V);		auto *C = dyn_cast<Constant>(V);
return C && (C->isNullValue() \|\| cst_pred_ty<is_zero_int>().match(C));		return C && (C->isNullValue() \|\| cst_pred_ty<is_zero_int>().match(C));
}		}
};		};
/// Match any null constant or a vector with all elements equal to 0.		/// Match any null constant or a vector with all elements equal to 0.
/// For vectors, this includes constants with undefined elements.		/// For vectors, this includes constants with undefined elements.
inline is_zero m_Zero() {		inline is_zero m_Zero() {
return is_zero();		return is_zero();
▲ Show 20 Lines • Show All 131 Lines • ▼ Show 20 Lines

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////

template <typename Class> struct bind_ty {		template <typename Class> struct bind_ty {
Class *&VR;		Class *&VR;

bind_ty(Class *&V) : VR(V) {}		bind_ty(Class *&V) : VR(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CV = dyn_cast<Class>(V)) {		if (auto *CV = dyn_cast<Class>(V)) {
		if (!MContext.acceptBoundNode(V)) return false;

VR = CV;		VR = CV;
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a value, capturing it if we match.		/// Match a value, capturing it if we match.
Show All 23 Lines
}		}

/// Match a specified Value*.		/// Match a specified Value*.
struct specificval_ty {		struct specificval_ty {
const Value *Val;		const Value *Val;

specificval_ty(const Value *V) : Val(V) {}		specificval_ty(const Value *V) : Val(V) {}

template <typename ITy> bool match(ITy *V) { return V == Val; }		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) { return V == Val; }
};		};

/// Match if we have a specific specified value.		/// Match if we have a specific specified value.
inline specificval_ty m_Specific(const Value *V) { return V; }		inline specificval_ty m_Specific(const Value *V) { return V; }

/// Stores a reference to the Value , not the Value itself,		/// Stores a reference to the Value , not the Value itself,
/// thus can be used in commutative matchers.		/// thus can be used in commutative matchers.
template <typename Class> struct deferredval_ty {		template <typename Class> struct deferredval_ty {
Class *const &Val;		Class *const &Val;

deferredval_ty(Class *const &V) : Val(V) {}		deferredval_ty(Class *const &V) : Val(V) {}

template <typename ITy> bool match(ITy *const V) { return V == Val; }		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *const V, MatchContext & MContext) { return V == Val; }
};		};

/// A commutative-friendly version of m_Specific().		/// A commutative-friendly version of m_Specific().
inline deferredval_ty<Value> m_Deferred(Value *const &V) { return V; }		inline deferredval_ty<Value> m_Deferred(Value *const &V) { return V; }
inline deferredval_ty<const Value> m_Deferred(const Value *const &V) {		inline deferredval_ty<const Value> m_Deferred(const Value *const &V) {
return V;		return V;
}		}

/// Match a specified floating point value or vector of all elements of		/// Match a specified floating point value or vector of all elements of
/// that value.		/// that value.
struct specific_fpval {		struct specific_fpval {
double Val;		double Val;

specific_fpval(double V) : Val(V) {}		specific_fpval(double V) : Val(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CFP = dyn_cast<ConstantFP>(V))		if (const auto *CFP = dyn_cast<ConstantFP>(V))
return CFP->isExactlyValue(Val);		return CFP->isExactlyValue(Val);
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CFP = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))		if (auto *CFP = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))
return CFP->isExactlyValue(Val);		return CFP->isExactlyValue(Val);
return false;		return false;
}		}
};		};

/// Match a specific floating point value or vector with all elements		/// Match a specific floating point value or vector with all elements
/// equal to the value.		/// equal to the value.
inline specific_fpval m_SpecificFP(double V) { return specific_fpval(V); }		inline specific_fpval m_SpecificFP(double V) { return specific_fpval(V); }

/// Match a float 1.0 or vector with all elements equal to 1.0.		/// Match a float 1.0 or vector with all elements equal to 1.0.
inline specific_fpval m_FPOne() { return m_SpecificFP(1.0); }		inline specific_fpval m_FPOne() { return m_SpecificFP(1.0); }

struct bind_const_intval_ty {		struct bind_const_intval_ty {
uint64_t &VR;		uint64_t &VR;

bind_const_intval_ty(uint64_t &V) : VR(V) {}		bind_const_intval_ty(uint64_t &V) : VR(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CV = dyn_cast<ConstantInt>(V))		if (const auto *CV = dyn_cast<ConstantInt>(V))
if (CV->getValue().ule(UINT64_MAX)) {		if (CV->getValue().ule(UINT64_MAX)) {
VR = CV->getZExtValue();		VR = CV->getZExtValue();
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a specified integer value or vector of all elements of that		/// Match a specified integer value or vector of all elements of that
/// value.		/// value.
struct specific_intval {		struct specific_intval {
APInt Val;		APInt Val;

specific_intval(APInt V) : Val(std::move(V)) {}		specific_intval(APInt V) : Val(std::move(V)) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
const auto *CI = dyn_cast<ConstantInt>(V);		const auto *CI = dyn_cast<ConstantInt>(V);
if (!CI && V->getType()->isVectorTy())		if (!CI && V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());		CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());

return CI && APInt::isSameValue(CI->getValue(), Val);		return CI && APInt::isSameValue(CI->getValue(), Val);
}		}
};		};
Show All 13 Lines
inline bind_const_intval_ty m_ConstantInt(uint64_t &V) { return V; }		inline bind_const_intval_ty m_ConstantInt(uint64_t &V) { return V; }

/// Match a specified basic block value.		/// Match a specified basic block value.
struct specific_bbval {		struct specific_bbval {
BasicBlock *Val;		BasicBlock *Val;

specific_bbval(BasicBlock *Val) : Val(Val) {}		specific_bbval(BasicBlock *Val) : Val(Val) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EC; return match_context(V, EC); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
const auto *BB = dyn_cast<BasicBlock>(V);		const auto *BB = dyn_cast<BasicBlock>(V);
return BB && BB == Val;		return BB && BB == Val;
}		}
};		};

/// Match a specific basic block value.		/// Match a specific basic block value.
inline specific_bbval m_SpecificBB(BasicBlock *BB) {		inline specific_bbval m_SpecificBB(BasicBlock *BB) {
return specific_bbval(BB);		return specific_bbval(BB);
Show All 15 Lines
struct AnyBinaryOp_match {		struct AnyBinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
AnyBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		AnyBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<BinaryOperator>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return (L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		auto * I = match_dyn_cast<MatchContext, BinaryOperator>(V);
(Commutable && L.match(I->getOperand(1)) &&		if (!I) return false;
R.match(I->getOperand(0)));
		if (!MContext.acceptInnerNode(I)) return false;

		MatchContext LRContext(MContext);
		if (L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext))) return true;
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline AnyBinaryOp_match<LHS, RHS> m_BinOp(const LHS &L, const RHS &R) {		inline AnyBinaryOp_match<LHS, RHS> m_BinOp(const LHS &L, const RHS &R) {
return AnyBinaryOp_match<LHS, RHS>(L, R);		return AnyBinaryOp_match<LHS, RHS>(L, R);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Matchers for specific binary operators.		// Matchers for specific binary operators.
//		//

template <typename LHS_t, typename RHS_t, unsigned Opcode,		template <typename LHS_t, typename RHS_t, unsigned Opcode,
bool Commutable = false>		bool Commutable = false>
struct BinaryOp_match {		struct BinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
BinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		BinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<BinaryOperator>(V);		auto * I = match_dyn_cast<MatchContext, const BinaryOperator>(V);
return (L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		if (I && I->getOpcode() == Opcode) {
(Commutable && L.match(I->getOperand(1)) &&		MatchContext LRContext(MContext);
R.match(I->getOperand(0)));		if (!MContext.acceptInnerNode(I)) return false;
		if (L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext))) return true;
		return false;
}		}
if (auto *CE = dyn_cast<ConstantExpr>(V))		if (auto *CE = dyn_cast<ConstantExpr>(V))
return CE->getOpcode() == Opcode &&		return CE->getOpcode() == Opcode &&
((L.match(CE->getOperand(0)) && R.match(CE->getOperand(1))) \|\|		((L.match(CE->getOperand(0)) && R.match(CE->getOperand(1))) \|\|
(Commutable && L.match(CE->getOperand(1)) &&		(Commutable && L.match(CE->getOperand(1)) &&
R.match(CE->getOperand(0))));		R.match(CE->getOperand(0))));
return false;		return false;
}		}
Show All 22 Lines	inline BinaryOp_match<LHS, RHS, Instruction::FSub> m_FSub(const LHS &L,
const RHS &R) {		const RHS &R) {
return BinaryOp_match<LHS, RHS, Instruction::FSub>(L, R);		return BinaryOp_match<LHS, RHS, Instruction::FSub>(L, R);
}		}

template <typename Op_t> struct FNeg_match {		template <typename Op_t> struct FNeg_match {
Op_t X;		Op_t X;

FNeg_match(const Op_t &Op) : X(Op) {}		FNeg_match(const Op_t &Op) : X(Op) {}
template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *FPMO = dyn_cast<FPMathOperator>(V);		auto *FPMO = dyn_cast<FPMathOperator>(V);
if (!FPMO) return false;		if (!FPMO) return false;

if (FPMO->getOpcode() == Instruction::FNeg)		if (match_cast<MatchContext, const Operator>(V)->getOpcode() == Instruction::FNeg)
return X.match(FPMO->getOperand(0));		return X.match(FPMO->getOperand(0));

if (FPMO->getOpcode() == Instruction::FSub) {		if (match_cast<MatchContext, const Operator>(V)->getOpcode() == Instruction::FSub) {
if (FPMO->hasNoSignedZeros()) {		if (FPMO->hasNoSignedZeros()) {
// With 'nsz', any zero goes.		// With 'nsz', any zero goes.
if (!cstfp_pred_ty<is_any_zero_fp>().match(FPMO->getOperand(0)))		if (!cstfp_pred_ty<is_any_zero_fp>().match_context(FPMO->getOperand(0), MContext))
return false;		return false;
} else {		} else {
// Without 'nsz', we need fsub -0.0, X exactly.		// Without 'nsz', we need fsub -0.0, X exactly.
if (!cstfp_pred_ty<is_neg_zero_fp>().match(FPMO->getOperand(0)))		if (!cstfp_pred_ty<is_neg_zero_fp>().match_context(FPMO->getOperand(0), MContext))
return false;		return false;
}		}

return X.match(FPMO->getOperand(1));		return X.match_context(FPMO->getOperand(1), MContext);
}		}

return false;		return false;
}		}
};		};

/// Match 'fneg X' as 'fsub -0.0, X'.		/// Match 'fneg X' as 'fsub -0.0, X'.
template <typename OpTy>		template <typename OpTy>
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	template <typename LHS_t, typename RHS_t, unsigned Opcode,
unsigned WrapFlags = 0>		unsigned WrapFlags = 0>
struct OverflowingBinaryOp_match {		struct OverflowingBinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

OverflowingBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS)		OverflowingBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS)
: L(LHS), R(RHS) {}		: L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (auto *Op = dyn_cast<OverflowingBinaryOperator>(V)) {		if (auto *Op = dyn_cast<OverflowingBinaryOperator>(V)) {
if (Op->getOpcode() != Opcode)		if (Op->getOpcode() != Opcode)
return false;		return false;
if (WrapFlags & OverflowingBinaryOperator::NoUnsignedWrap &&		if (WrapFlags & OverflowingBinaryOperator::NoUnsignedWrap &&
!Op->hasNoUnsignedWrap())		!Op->hasNoUnsignedWrap())
return false;		return false;
if (WrapFlags & OverflowingBinaryOperator::NoSignedWrap &&		if (WrapFlags & OverflowingBinaryOperator::NoSignedWrap &&
!Op->hasNoSignedWrap())		!Op->hasNoSignedWrap())
return false;		return false;
return L.match(Op->getOperand(0)) && R.match(Op->getOperand(1));		return L.match_context(Op->getOperand(0), MContext) && R.match_context(Op->getOperand(1), MContext);
}		}
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline OverflowingBinaryOp_match<LHS, RHS, Instruction::Add,		inline OverflowingBinaryOp_match<LHS, RHS, Instruction::Add,
OverflowingBinaryOperator::NoSignedWrap>		OverflowingBinaryOperator::NoSignedWrap>
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
//		//
template <typename LHS_t, typename RHS_t, typename Predicate>		template <typename LHS_t, typename RHS_t, typename Predicate>
struct BinOpPred_match : Predicate {		struct BinOpPred_match : Predicate {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

BinOpPred_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		BinOpPred_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<Instruction>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return this->isOpType(I->getOpcode()) && L.match(I->getOperand(0)) &&		if (auto *I = match_dyn_cast<MatchContext, Instruction>(V))
R.match(I->getOperand(1));		return this->isOpType(I->getOpcode()) && L.match_context(I->getOperand(0), MContext) &&
		R.match_context(I->getOperand(1), MContext);
if (auto *CE = dyn_cast<ConstantExpr>(V))		if (auto *CE = dyn_cast<ConstantExpr>(V))
return this->isOpType(CE->getOpcode()) && L.match(CE->getOperand(0)) &&		return this->isOpType(CE->getOpcode()) && L.match(CE->getOperand(0)) &&
R.match(CE->getOperand(1));		R.match(CE->getOperand(1));
return false;		return false;
}		}
};		};

struct is_shift_op {		struct is_shift_op {
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Class that matches exact binary ops.		// Class that matches exact binary ops.
//		//
template <typename SubPattern_t> struct Exact_match {		template <typename SubPattern_t> struct Exact_match {
SubPattern_t SubPattern;		SubPattern_t SubPattern;

Exact_match(const SubPattern_t &SP) : SubPattern(SP) {}		Exact_match(const SubPattern_t &SP) : SubPattern(SP) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (auto *PEO = dyn_cast<PossiblyExactOperator>(V))		if (auto *PEO = dyn_cast<PossiblyExactOperator>(V))
return PEO->isExact() && SubPattern.match(V);		return PEO->isExact() && SubPattern.match_context(V, MContext);
return false;		return false;
}		}
};		};

template <typename T> inline Exact_match<T> m_Exact(const T &SubPattern) {		template <typename T> inline Exact_match<T> m_Exact(const T &SubPattern) {
return SubPattern;		return SubPattern;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Matchers for CmpInst classes		// Matchers for CmpInst classes
//		//

template <typename LHS_t, typename RHS_t, typename Class, typename PredicateTy,		template <typename LHS_t, typename RHS_t, typename Class, typename PredicateTy,
bool Commutable = false>		bool Commutable = false>
struct CmpClass_match {		struct CmpClass_match {
PredicateTy &Predicate;		PredicateTy &Predicate;
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
CmpClass_match(PredicateTy &Pred, const LHS_t &LHS, const RHS_t &RHS)		CmpClass_match(PredicateTy &Pred, const LHS_t &LHS, const RHS_t &RHS)
: Predicate(Pred), L(LHS), R(RHS) {}		: Predicate(Pred), L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<Class>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if ((L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		if (auto *I = match_dyn_cast<MatchContext, Class>(V)) {
(Commutable && L.match(I->getOperand(1)) &&		if (!MContext.acceptInnerNode(I)) return false;
R.match(I->getOperand(0)))) {		MatchContext LRContext(MContext);
		if ((L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) \|\|
		(Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext)))) {
Predicate = I->getPredicate();		Predicate = I->getPredicate();
return true;		return true;
}		}
		}
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>		inline CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>
m_Cmp(CmpInst::Predicate &Pred, const LHS &L, const RHS &R) {		m_Cmp(CmpInst::Predicate &Pred, const LHS &L, const RHS &R) {
return CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>(Pred, L, R);		return CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>(Pred, L, R);
Show All 16 Lines
//		//

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, unsigned Opcode> struct OneOps_match {		template <typename T0, unsigned Opcode> struct OneOps_match {
T0 Op1;		T0 Op1;

OneOps_match(const T0 &Op1) : Op1(Op1) {}		OneOps_match(const T0 &Op1) : Op1(Op1) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0));		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
		return Op1.match_context(I->getOperand(0), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, typename T1, unsigned Opcode> struct TwoOps_match {		template <typename T0, typename T1, unsigned Opcode> struct TwoOps_match {
T0 Op1;		T0 Op1;
T1 Op2;		T1 Op2;

TwoOps_match(const T0 &Op1, const T1 &Op2) : Op1(Op1), Op2(Op2) {}		TwoOps_match(const T0 &Op1, const T1 &Op2) : Op1(Op1), Op2(Op2) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0)) && Op2.match(I->getOperand(1));		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
		return Op1.match_context(I->getOperand(0), MContext) &&
		Op2.match_context(I->getOperand(1), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, typename T1, typename T2, unsigned Opcode>		template <typename T0, typename T1, typename T2, unsigned Opcode>
struct ThreeOps_match {		struct ThreeOps_match {
T0 Op1;		T0 Op1;
T1 Op2;		T1 Op2;
T2 Op3;		T2 Op3;

ThreeOps_match(const T0 &Op1, const T1 &Op2, const T2 &Op3)		ThreeOps_match(const T0 &Op1, const T1 &Op2, const T2 &Op3)
: Op1(Op1), Op2(Op2), Op3(Op3) {}		: Op1(Op1), Op2(Op2), Op3(Op3) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0)) && Op2.match(I->getOperand(1)) &&		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
Op3.match(I->getOperand(2));		return Op1.match_context(I->getOperand(0), MContext) &&
		Op2.match_context(I->getOperand(1), MContext) &&
		Op3.match_context(I->getOperand(2), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches SelectInst.		/// Matches SelectInst.
template <typename Cond, typename LHS, typename RHS>		template <typename Cond, typename LHS, typename RHS>
inline ThreeOps_match<Cond, LHS, RHS, Instruction::Select>		inline ThreeOps_match<Cond, LHS, RHS, Instruction::Select>
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
// Matchers for CastInst classes		// Matchers for CastInst classes
//		//

template <typename Op_t, unsigned Opcode> struct CastClass_match {		template <typename Op_t, unsigned Opcode> struct CastClass_match {
Op_t Op;		Op_t Op;

CastClass_match(const Op_t &OpMatch) : Op(OpMatch) {}		CastClass_match(const Op_t &OpMatch) : Op(OpMatch) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *O = dyn_cast<Operator>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return O->getOpcode() == Opcode && Op.match(O->getOperand(0));		if (auto O = match_dyn_cast<MatchContext, Operator>(V))
		return O->getOpcode() == Opcode && MContext.acceptInnerNode(O) && Op.match_context(O->getOperand(0), MContext);
return false;		return false;
}		}
};		};

/// Matches BitCast.		/// Matches BitCast.
template <typename OpTy>		template <typename OpTy>
inline CastClass_match<OpTy, Instruction::BitCast> m_BitCast(const OpTy &Op) {		inline CastClass_match<OpTy, Instruction::BitCast> m_BitCast(const OpTy &Op) {
return CastClass_match<OpTy, Instruction::BitCast>(Op);		return CastClass_match<OpTy, Instruction::BitCast>(Op);
▲ Show 20 Lines • Show All 85 Lines • ▼ Show 20 Lines
// Matchers for control flow.		// Matchers for control flow.
//		//

struct br_match {		struct br_match {
BasicBlock *&Succ;		BasicBlock *&Succ;

br_match(BasicBlock *&Succ) : Succ(Succ) {}		br_match(BasicBlock *&Succ) : Succ(Succ) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *BI = dyn_cast<BranchInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		if (auto *BI = match_dyn_cast<MatchContext, BranchInst>(V))
if (BI->isUnconditional()) {		if (BI->isUnconditional()) {
Succ = BI->getSuccessor(0);		Succ = BI->getSuccessor(0);
return true;		return true;
}		}
return false;		return false;
}		}
};		};

inline br_match m_UnconditionalBr(BasicBlock *&Succ) { return br_match(Succ); }		inline br_match m_UnconditionalBr(BasicBlock *&Succ) { return br_match(Succ); }

template <typename Cond_t, typename TrueBlock_t, typename FalseBlock_t>		template <typename Cond_t, typename TrueBlock_t, typename FalseBlock_t>
struct brc_match {		struct brc_match {
Cond_t Cond;		Cond_t Cond;
TrueBlock_t T;		TrueBlock_t T;
FalseBlock_t F;		FalseBlock_t F;

brc_match(const Cond_t &C, const TrueBlock_t &t, const FalseBlock_t &f)		brc_match(const Cond_t &C, const TrueBlock_t &t, const FalseBlock_t &f)
: Cond(C), T(t), F(f) {}		: Cond(C), T(t), F(f) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *BI = dyn_cast<BranchInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (BI->isConditional() && Cond.match(BI->getCondition()))		if (auto *BI = match_dyn_cast<MatchContext, BranchInst>(V))
return T.match(BI->getSuccessor(0)) && F.match(BI->getSuccessor(1));		if (BI->isConditional() && Cond.match(BI->getCondition())) {
		return T.match_context(BI->getSuccessor(0), MContext) && F.match_context(BI->getSuccessor(1), MContext);
		}
return false;		return false;
}		}
};		};

template <typename Cond_t>		template <typename Cond_t>
inline brc_match<Cond_t, bind_ty<BasicBlock>, bind_ty<BasicBlock>>		inline brc_match<Cond_t, bind_ty<BasicBlock>, bind_ty<BasicBlock>>
m_Br(const Cond_t &C, BasicBlock &T, BasicBlock &F) {		m_Br(const Cond_t &C, BasicBlock &T, BasicBlock &F) {
return brc_match<Cond_t, bind_ty<BasicBlock>, bind_ty<BasicBlock>>(		return brc_match<Cond_t, bind_ty<BasicBlock>, bind_ty<BasicBlock>>(
Show All 15 Lines
struct MaxMin_match {		struct MaxMin_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
MaxMin_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		MaxMin_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
// Look for "(x pred y) ? x : y" or "(x pred y) ? y : x".		// Look for "(x pred y) ? x : y" or "(x pred y) ? y : x".
auto *SI = dyn_cast<SelectInst>(V);		auto *SI = match_dyn_cast<MatchContext, SelectInst>(V);
if (!SI)		if (!SI \|\| !MContext.acceptInnerNode(SI))
return false;		return false;
auto *Cmp = dyn_cast<CmpInst_t>(SI->getCondition());		auto *Cmp = match_dyn_cast<MatchContext, CmpInst_t>(SI->getCondition());
if (!Cmp)		if (!Cmp \|\| !MContext.acceptInnerNode(Cmp))
return false;		return false;
// At this point we have a select conditioned on a comparison. Check that		// At this point we have a select conditioned on a comparison. Check that
// it is the values returned by the select that are being compared.		// it is the values returned by the select that are being compared.
Value *TrueVal = SI->getTrueValue();		Value *TrueVal = SI->getTrueValue();
Value *FalseVal = SI->getFalseValue();		Value *FalseVal = SI->getFalseValue();
Value *LHS = Cmp->getOperand(0);		Value *LHS = Cmp->getOperand(0);
Value *RHS = Cmp->getOperand(1);		Value *RHS = Cmp->getOperand(1);
if ((TrueVal != LHS \|\| FalseVal != RHS) &&		if ((TrueVal != LHS \|\| FalseVal != RHS) &&
(TrueVal != RHS \|\| FalseVal != LHS))		(TrueVal != RHS \|\| FalseVal != LHS))
return false;		return false;
typename CmpInst_t::Predicate Pred =		typename CmpInst_t::Predicate Pred =
LHS == TrueVal ? Cmp->getPredicate() : Cmp->getInversePredicate();		LHS == TrueVal ? Cmp->getPredicate() : Cmp->getInversePredicate();
// Does "(x pred y) ? x : y" represent the desired max/min operation?		// Does "(x pred y) ? x : y" represent the desired max/min operation?
if (!Pred_t::match(Pred))		if (!Pred_t::match(Pred))
return false;		return false;

// It does! Bind the operands.		// It does! Bind the operands.
return (L.match(LHS) && R.match(RHS)) \|\|		MatchContext LRContext(MContext);
(Commutable && L.match(RHS) && R.match(LHS));		if (L.match_context(LHS, LRContext) && R.match_context(RHS, LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(RHS, MContext) && R.match_context(LHS, MContext))) return true;
		return false;
}		}
};		};

/// Helper class for identifying signed max predicates.		/// Helper class for identifying signed max predicates.
struct smax_pred_ty {		struct smax_pred_ty {
static bool match(ICmpInst::Predicate Pred) {		static bool match(ICmpInst::Predicate Pred) {
return Pred == CmpInst::ICMP_SGT \|\| Pred == CmpInst::ICMP_SGE;		return Pred == CmpInst::ICMP_SGT \|\| Pred == CmpInst::ICMP_SGE;
}		}
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
struct UAddWithOverflow_match {		struct UAddWithOverflow_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;
Sum_t S;		Sum_t S;

UAddWithOverflow_match(const LHS_t &L, const RHS_t &R, const Sum_t &S)		UAddWithOverflow_match(const LHS_t &L, const RHS_t &R, const Sum_t &S)
: L(L), R(R), S(S) {}		: L(L), R(R), S(S) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
Value ICmpLHS, ICmpRHS;		Value ICmpLHS, ICmpRHS;
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
if (!m_ICmp(Pred, m_Value(ICmpLHS), m_Value(ICmpRHS)).match(V))		if (!m_ICmp(Pred, m_Value(ICmpLHS), m_Value(ICmpRHS)).match(V))
return false;		return false;

Value AddLHS, AddRHS;		Value AddLHS, AddRHS;
auto AddExpr = m_Add(m_Value(AddLHS), m_Value(AddRHS));		auto AddExpr = m_Add(m_Value(AddLHS), m_Value(AddRHS));

Show All 36 Lines
}		}

template <typename Opnd_t> struct Argument_match {		template <typename Opnd_t> struct Argument_match {
unsigned OpI;		unsigned OpI;
Opnd_t Val;		Opnd_t Val;

Argument_match(unsigned OpIdx, const Opnd_t &V) : OpI(OpIdx), Val(V) {}		Argument_match(unsigned OpIdx, const Opnd_t &V) : OpI(OpIdx), Val(V) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
// FIXME: Should likely be switched to use `CallBase`.		// FIXME: Should likely be switched to use `CallBase`.
if (const auto *CI = dyn_cast<CallInst>(V))		if (const auto *CI = match_dyn_cast<MatchContext, CallInst>(V))
return Val.match(CI->getArgOperand(OpI));		return Val.match(CI->getArgOperand(OpI));
return false;		return false;
}		}
};		};

/// Match an argument.		/// Match an argument.
template <unsigned OpI, typename Opnd_t>		template <unsigned OpI, typename Opnd_t>
inline Argument_match<Opnd_t> m_Argument(const Opnd_t &Op) {		inline Argument_match<Opnd_t> m_Argument(const Opnd_t &Op) {
return Argument_match<Opnd_t>(OpI, Op);		return Argument_match<Opnd_t>(OpI, Op);
}		}

/// Intrinsic matchers.		/// Intrinsic matchers.
struct IntrinsicID_match {		struct IntrinsicID_match {
unsigned ID;		unsigned ID;

IntrinsicID_match(Intrinsic::ID IntrID) : ID(IntrID) {}		IntrinsicID_match(Intrinsic::ID IntrID) : ID(IntrID) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (const auto *CI = dyn_cast<CallInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		if (const auto *CI = match_dyn_cast<MatchContext, CallInst>(V))
if (const auto *F = CI->getCalledFunction())		if (const auto *F = CI->getCalledFunction())
return F->getIntrinsicID() == ID;		return F->getIntrinsicID() == ID;
return false;		return false;
}		}
};		};

/// Intrinsic matches are combinations of ID matchers, and argument		/// Intrinsic matches are combinations of ID matchers, and argument
/// matchers. Higher arity matcher are defined recursively in terms of and-ing		/// matchers. Higher arity matcher are defined recursively in terms of and-ing
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
m_c_FMul(const LHS &L, const RHS &R) {		m_c_FMul(const LHS &L, const RHS &R) {
return BinaryOp_match<LHS, RHS, Instruction::FMul, true>(L, R);		return BinaryOp_match<LHS, RHS, Instruction::FMul, true>(L, R);
}		}

template <typename Opnd_t> struct Signum_match {		template <typename Opnd_t> struct Signum_match {
Opnd_t Val;		Opnd_t Val;
Signum_match(const Opnd_t &V) : Val(V) {}		Signum_match(const Opnd_t &V) : Val(V) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
unsigned TypeSize = V->getType()->getScalarSizeInBits();		unsigned TypeSize = V->getType()->getScalarSizeInBits();
if (TypeSize == 0)		if (TypeSize == 0)
return false;		return false;

unsigned ShiftWidth = TypeSize - 1;		unsigned ShiftWidth = TypeSize - 1;
Value OpL = nullptr, OpR = nullptr;		Value OpL = nullptr, OpR = nullptr;

// This is the representation of signum we match:		// This is the representation of signum we match:
Show All 23 Lines
template <typename Val_t> inline Signum_match<Val_t> m_Signum(const Val_t &V) {		template <typename Val_t> inline Signum_match<Val_t> m_Signum(const Val_t &V) {
return Signum_match<Val_t>(V);		return Signum_match<Val_t>(V);
}		}

template <int Ind, typename Opnd_t> struct ExtractValue_match {		template <int Ind, typename Opnd_t> struct ExtractValue_match {
Opnd_t Val;		Opnd_t Val;
ExtractValue_match(const Opnd_t &V) : Val(V) {}		ExtractValue_match(const Opnd_t &V) : Val(V) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EC; return match_context(V, EC); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (auto *I = dyn_cast<ExtractValueInst>(V))		if (auto *I = dyn_cast<ExtractValueInst>(V))
return I->getNumIndices() == 1 && I->getIndices()[0] == Ind &&		return I->getNumIndices() == 1 && I->getIndices()[0] == Ind &&
Val.match(I->getAggregateOperand());		Val.match_context(I->getAggregateOperand(), MContext);
return false;		return false;
}		}
};		};

/// Match a single index ExtractValue instruction.		/// Match a single index ExtractValue instruction.
/// For example m_ExtractValue<1>(...)		/// For example m_ExtractValue<1>(...)
template <int Ind, typename Val_t>		template <int Ind, typename Val_t>
inline ExtractValue_match<Ind, Val_t> m_ExtractValue(const Val_t &V) {		inline ExtractValue_match<Ind, Val_t> m_ExtractValue(const Val_t &V) {
return ExtractValue_match<Ind, Val_t>(V);		return ExtractValue_match<Ind, Val_t>(V);
}		}

} // end namespace PatternMatch		} // end namespace PatternMatch
} // end namespace llvm		} // end namespace llvm

#endif // LLVM_IR_PATTERNMATCH_H		#endif // LLVM_IR_PATTERNMATCH_H

llvm/include/llvm/IR/PredicatedInst.h

This file was added.

				//===-- llvm/PredicatedInst.h - Predication utility subclass --- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines various classes for working with predicated instructions.
				// Predicated instructions are either regular instructions or calls to
				// Vector Predication (VP) intrinsics that have a mask and an explicit
				// vector length argument.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_PREDICATEDINST_H
				#define LLVM_IR_PREDICATEDINST_H

				#include "llvm/ADT/None.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/MatcherCast.h"
				#include "llvm/IR/Operator.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/Value.h"
				#include "llvm/Support/Casting.h"

				#include <cstddef>

				namespace llvm {

				class BasicBlock;

				class PredicatedInstruction : public User {
				public:
				// The PredicatedInstruction class is intended to be used as a utility, and is
				// never itself instantiated.
				PredicatedInstruction() = delete;
				~PredicatedInstruction() = delete;

				void copyIRFlags(const Value *V, bool IncludeWrapFlags) {
				cast<Instruction>(this)->copyIRFlags(V, IncludeWrapFlags);
				}

				BasicBlock *getParent() { return cast<Instruction>(this)->getParent(); }
				const BasicBlock *getParent() const {
				return cast<const Instruction>(this)->getParent();
				}

				void *operator new(size_t s) = delete;

				Value *getMaskParam() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP)
				return nullptr;
				return thisVP->getMaskParam();
				}

				Value *getVectorLengthParam() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP)
				return nullptr;
				return thisVP->getVectorLengthParam();
				}

				/// \returns True if the passed vector length value has no predicating effect
				/// on the op.
				bool canIgnoreVectorLengthParam() const;

				/// \return True if the static operator of this instruction has a mask or
				/// vector length parameter.
				bool isVectorPredicatedOp() const { return isa<VPIntrinsic>(this); }

				/// \returns the effective Opcode of this operation (ignoring the mask and
				/// vector length param).
				unsigned getOpcode() const {
				auto *VPInst = dyn_cast<VPIntrinsic>(this);

				if (!VPInst) {
				return cast<Instruction>(this)->getOpcode();
				}

				return VPInst->getFunctionalOpcode();
				}

				static bool classof(const Instruction *I) { return isa<Instruction>(I); }
				static bool classof(const ConstantExpr *CE) { return false; }
				static bool classof(const Value *V) { return isa<Instruction>(V); }

				/// Convenience function for getting all the fast-math flags, which must be an
				/// operator which supports these flags. See LangRef.html for the meaning of
				/// these flags.
				FastMathFlags getFastMathFlags() const;
				};

				class PredicatedOperator : public User {
				public:
				// The PredicatedOperator class is intended to be used as a utility, and is
				// never itself instantiated.
				PredicatedOperator() = delete;
				~PredicatedOperator() = delete;

				void *operator new(size_t s) = delete;

				/// Return the opcode for this Instruction or ConstantExpr.
				unsigned getOpcode() const {
				auto *VPInst = dyn_cast<VPIntrinsic>(this);

				// Conceal the fp operation if it has non-default rounding mode or exception
				// behavior
				if (VPInst && !VPInst->isConstrainedOp()) {
				return VPInst->getFunctionalOpcode();
				}

				if (const Instruction *I = dyn_cast<Instruction>(this))
				return I->getOpcode();

				return cast<ConstantExpr>(this)->getOpcode();
				}

				Value *getMask() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP)
				return nullptr;
				return thisVP->getMaskParam();
				}

				Value *getVectorLength() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP)
				return nullptr;
				return thisVP->getVectorLengthParam();
				}

				void copyIRFlags(const Value *V, bool IncludeWrapFlags = true);
				FastMathFlags getFastMathFlags() const {
				auto *I = dyn_cast<Instruction>(this);
				if (I)
				return I->getFastMathFlags();
				else
				return FastMathFlags();
				}

				static bool classof(const Instruction *I) {
				return isa<VPIntrinsic>(I) \|\| isa<Operator>(I);
				}
				static bool classof(const ConstantExpr *CE) { return isa<Operator>(CE); }
				static bool classof(const Value *V) {
				return isa<VPIntrinsic>(V) \|\| isa<Operator>(V);
				}
				};

				class PredicatedBinaryOperator : public PredicatedOperator {
				public:
				// The PredicatedBinaryOperator class is intended to be used as a utility, and
				// is never itself instantiated.
				PredicatedBinaryOperator() = delete;
				~PredicatedBinaryOperator() = delete;

				using BinaryOps = Instruction::BinaryOps;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction *I) {
				if (isa<BinaryOperator>(I))
				return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->isBinaryOp();
				}
				static bool classof(const ConstantExpr *CE) {
				return isa<BinaryOperator>(CE);
				}
				static bool classof(const Value *V) {
				auto *I = dyn_cast<Instruction>(V);
				if (I && classof(I))
				return true;
				auto *CE = dyn_cast<ConstantExpr>(V);
				return CE && classof(CE);
				}

				/// Construct a predicated binary instruction, given the opcode and the two
				/// operands.
				static Instruction Create(Module Mod, Value Mask, Value VectorLen,
				Instruction::BinaryOps Opc, Value V1, Value V2,
				const Twine &Name, BasicBlock *InsertAtEnd,
				Instruction *InsertBefore);

				static Instruction Create(Module Mod, Value Mask, Value VectorLen,
				BinaryOps Opc, Value V1, Value V2,
				const Twine &Name = Twine(),
				Instruction *InsertBefore = nullptr) {
				return Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, nullptr,
				InsertBefore);
				}

				static Instruction Create(Module Mod, Value Mask, Value VectorLen,
				BinaryOps Opc, Value V1, Value V2,
				const Twine &Name, BasicBlock *InsertAtEnd) {
				return Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, InsertAtEnd,
				nullptr);
				}

				static Instruction CreateWithCopiedFlags(Module Mod, Value *Mask,
				Value *VectorLen, BinaryOps Opc,
				Value V1, Value V2,
				Instruction *CopyBO,
				const Twine &Name = "") {
				Instruction *BO =
				Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, nullptr, nullptr);
				BO->copyIRFlags(CopyBO);
				return BO;
				}
				};

				class PredicatedICmpInst : public PredicatedBinaryOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedICmpInst() = delete;
				~PredicatedICmpInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction *I) {
				if (isa<ICmpInst>(I))
				return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::ICmp;
				}
				static bool classof(const ConstantExpr *CE) {
				return CE->getOpcode() == Instruction::ICmp;
				}
				static bool classof(const Value *V) {
				auto *I = dyn_cast<Instruction>(V);
				if (I && classof(I))
				return true;
				auto *CE = dyn_cast<ConstantExpr>(V);
				return CE && classof(CE);
				}

				ICmpInst::Predicate getPredicate() const {
				auto *ICInst = dyn_cast<const ICmpInst>(this);
				if (ICInst)
				return ICInst->getPredicate();
				auto *CE = dyn_cast<const ConstantExpr>(this);
				if (CE)
				return static_cast<ICmpInst::Predicate>(CE->getPredicate());
				return static_cast<ICmpInst::Predicate>(
				cast<VPIntrinsic>(this)->getCmpPredicate());
				}
				};

				class PredicatedFCmpInst : public PredicatedBinaryOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedFCmpInst() = delete;
				~PredicatedFCmpInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction *I) {
				if (isa<FCmpInst>(I))
				return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::FCmp;
				}
				static bool classof(const ConstantExpr *CE) {
				return CE->getOpcode() == Instruction::FCmp;
				}
				static bool classof(const Value *V) {
				auto *I = dyn_cast<Instruction>(V);
				if (I && classof(I))
				return true;
				return isa<ConstantExpr>(V);
				}

				FCmpInst::Predicate getPredicate() const {
				auto *FCInst = dyn_cast<const FCmpInst>(this);
				if (FCInst)
				return FCInst->getPredicate();
				auto *CE = dyn_cast<const ConstantExpr>(this);
				if (CE)
				return static_cast<FCmpInst::Predicate>(CE->getPredicate());
				return static_cast<FCmpInst::Predicate>(
				cast<VPIntrinsic>(this)->getCmpPredicate());
				}
				};

				class PredicatedSelectInst : public PredicatedOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedSelectInst() = delete;
				~PredicatedSelectInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction *I) {
				if (isa<SelectInst>(I))
				return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::Select;
				}
				static bool classof(const ConstantExpr *CE) {
				return CE->getOpcode() == Instruction::Select;
				}
				static bool classof(const Value *V) {
				auto *I = dyn_cast<Instruction>(V);
				if (I && classof(I))
				return true;
				auto *CE = dyn_cast<ConstantExpr>(V);
				return CE && CE->getOpcode() == Instruction::Select;
				}

				const Value *getCondition() const { return getOperand(0); }
				const Value *getTrueValue() const { return getOperand(1); }
				const Value *getFalseValue() const { return getOperand(2); }
				Value *getCondition() { return getOperand(0); }
				Value *getTrueValue() { return getOperand(1); }
				Value *getFalseValue() { return getOperand(2); }

				void setCondition(Value *V) { setOperand(0, V); }
				void setTrueValue(Value *V) { setOperand(1, V); }
				void setFalseValue(Value *V) { setOperand(2, V); }
				};

				namespace PatternMatch {

				// PredicatedMatchContext for pattern matching
				struct PredicatedContext {
				Value *Mask;
				Value *VectorLength;
				Module *Mod;

				void reset(Value *V) {
				auto *PI = dyn_cast<PredicatedInstruction>(V);
				if (!PI) {
				VectorLength = nullptr;
				Mask = nullptr;
				Mod = nullptr;
				} else {
				VectorLength = PI->getVectorLengthParam();
				Mask = PI->getMaskParam();
				Mod = PI->getParent()->getParent()->getParent();
				}
				}

				PredicatedContext(Value *Val)
				: Mask(nullptr), VectorLength(nullptr), Mod(nullptr) {
				reset(Val);
				}

				PredicatedContext(const PredicatedContext &PC)
				: Mask(PC.Mask), VectorLength(PC.VectorLength), Mod(PC.Mod) {}

				/// accept a match where \p Val is in a non-leaf position in a match pattern
				bool acceptInnerNode(const Value *Val) const {
				auto PredI = dyn_cast<PredicatedInstruction>(Val);
				if (!PredI)
				return VectorLength == nullptr && Mask == nullptr;
				return VectorLength == PredI->getVectorLengthParam() &&
				Mask == PredI->getMaskParam();
				}

				/// accept a match where \p Val is bound to a free variable.
				bool acceptBoundNode(const Value *Val) const { return true; }

				/// whether this context is compatiable with \p E.
				bool acceptContext(PredicatedContext PC) const {
				return std::tie(PC.Mask, PC.VectorLength) == std::tie(Mask, VectorLength);
				}

				/// merge the context \p E into this context and return whether the resulting
				/// context is valid.
				bool mergeContext(PredicatedContext PC) const { return acceptContext(PC); }

				/// match \p P in a new contest for \p Val.
				template <typename Val, typename Pattern>
				bool reset_match(Val *V, const Pattern &P) {
				reset(V);
				return const_cast<Pattern &>(P).match_context(V, *this);
				}

				/// match \p P in the current context.
				template <typename Val, typename Pattern>
				bool try_match(Val *V, const Pattern &P) {
				PredicatedContext SubContext(*this);
				return const_cast<Pattern &>(P).match_context(V, SubContext);
				}
				};

				struct PredicatedContext;
				template <> struct MatcherCast<PredicatedContext, BinaryOperator> {
				using ActualCastType = PredicatedBinaryOperator;
				};
				template <> struct MatcherCast<PredicatedContext, Operator> {
				using ActualCastType = PredicatedOperator;
				};
				template <> struct MatcherCast<PredicatedContext, ICmpInst> {
				using ActualCastType = PredicatedICmpInst;
				};
				template <> struct MatcherCast<PredicatedContext, FCmpInst> {
				using ActualCastType = PredicatedFCmpInst;
				};
				template <> struct MatcherCast<PredicatedContext, SelectInst> {
				using ActualCastType = PredicatedSelectInst;
				};
				template <> struct MatcherCast<PredicatedContext, Instruction> {
				using ActualCastType = PredicatedInstruction;
				};

				} // namespace PatternMatch

				} // namespace llvm

				#endif // LLVM_IR_PREDICATEDINST_H

llvm/include/llvm/IR/VPBuilder.h

This file was added.

				#ifndef LLVM_IR_VPBUILDER_H
				#define LLVM_IR_VPBUILDER_H

				#include <llvm/IR/IRBuilder.h>
				#include <llvm/IR/Value.h>
				#include <llvm/IR/Instruction.h>
				#include <llvm/IR/InstrTypes.h>
				#include <llvm/IR/PredicatedInst.h>
				#include <llvm/IR/PatternMatch.h>

				namespace llvm {

				using ValArray = ArrayRef<Value*>;

				class VPBuilder {
				IRBuilder<> & Builder;

				// Explicit mask parameter
				Value * Mask;
				// Explicit vector length parameter
				Value * ExplicitVectorLength;
				// Compile-time vector length
				int StaticVectorLength;

				// get a valid mask/evl argument for the current predication contet
				Value& GetMaskForType(VectorType & VecTy);
				Value& GetEVLForType(VectorType & VecTy);

				public:
				VPBuilder(IRBuilder<> & _builder)
				: Builder(_builder)
				, Mask(nullptr)
				, ExplicitVectorLength(nullptr)
				, StaticVectorLength(-1)
				{}

				Module & getModule() const;
				LLVMContext & getContext() const { return Builder.getContext(); }

				// The cannonical vector type for this \p ElementTy
				VectorType& getVectorType(Type &ElementTy);

				// Predication context tracker
				VPBuilder& setMask(Value * _Mask) { Mask = _Mask; return *this; }
				VPBuilder& setEVL(Value * _ExplicitVectorLength) { ExplicitVectorLength = _ExplicitVectorLength; return *this; }
				VPBuilder& setStaticVL(int VLen) { StaticVectorLength = VLen; return *this; }

				// Create a map-vectorized copy of the instruction \p Inst with the underlying IRBuilder instance.
				// This operation may return nullptr if the instruction could not be vectorized.
				Value* CreateVectorCopy(Instruction & Inst, ValArray VecOpArray);

				// Memory
				Value& CreateContiguousStore(Value & Val, Value & Pointer, Align Alignment);
				Value& CreateContiguousLoad(Value & Pointer, Align Alignment);
				Value& CreateScatter(Value & Val, Value & PointerVec, Align Alignment);
				Value& CreateGather(Value & PointerVec, Align Alignment);
				};





				namespace PatternMatch {
				// Factory class to generate instructions in a context
				template<typename MatcherContext>
				class MatchContextBuilder {
				public:
				// MatchContextBuilder(MatcherContext MC);
				};


				// Context-free instruction builder
				template<>
				class MatchContextBuilder<EmptyContext> {
				public:
				MatchContextBuilder(EmptyContext & EC) {}

				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name = "") const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name);\
				} \
				template<typename IRBuilderType> \
				Value Create##OPC(IRBuilderType & Builder, Value V1, Value *V2, \
				const Twine &Name = "") const { \
				auto * Inst = BinaryOperator::Create(Instruction::OPC, V1, V2, Name); \
				Builder.Insert(Inst); return Inst; \
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Value Create##OPC(Value V1, Value *V2, \
				const Twine &Name, BasicBlock *BB) const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name, BB);\
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Value Create##OPC(Value V1, Value *V2, \
				const Twine &Name, Instruction *I) const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name, I);\
				}
				#include "llvm/IR/Instruction.def"
				#undef HANDLE_BINARY_INST

				BinaryOperator CreateFAddFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FAdd, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFSubFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FSub, V1, V2, FMFSource, Name);
				}
				template<typename IRBuilderType>
				BinaryOperator CreateFSubFMF(IRBuilderType & Builder, Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				auto * Inst = CreateFSubFMF(V1, V2, FMFSource, Name);
				Builder.Insert(Inst); return Inst;
				}
				BinaryOperator CreateFMulFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FMul, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFDivFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FDiv, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFRemFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FRem, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFNegFMF(Value Op, Instruction *FMFSource,
				const Twine &Name = "") {
				Value *Zero = ConstantFP::getNegativeZero(Op->getType());
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FSub, Zero, Op, FMFSource);
				}

				template<typename IRBuilderType>
				Value CreateFPTrunc(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPTrunc(V, DestTy, Name); }
				template<typename IRBuilderType>
				Value CreateFPExt(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPExt(V, DestTy, Name); }
				};



				// Context-free instruction builder
				template<>
				class MatchContextBuilder<PredicatedContext> {
				PredicatedContext & PC;
				public:
				MatchContextBuilder(PredicatedContext & PC) : PC(PC) {}

				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name = "") const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name);\
				} \
				template<typename IRBuilderType> \
				Instruction Create##OPC(IRBuilderType & Builder, Value V1, Value *V2, \
				const Twine &Name = "") const {\
				auto * PredInst = Create##OPC(V1, V2, Name); Builder.Insert(PredInst); return PredInst; \
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name, BasicBlock *BB) const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name, BB);\
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name, Instruction *I) const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name, I);\
				}
				#include "llvm/IR/Instruction.def"
				#undef HANDLE_BINARY_INST

				Instruction CreateFAddFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FAdd, V1, V2, FMFSource, Name);
				}
				Instruction CreateFSubFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FSub, V1, V2, FMFSource, Name);
				}
				template<typename IRBuilderType>
				Instruction CreateFSubFMF(IRBuilderType & Builder, Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				auto * Inst = CreateFSubFMF(V1, V2, FMFSource, Name);
				Builder.Insert(Inst); return Inst;
				}
				Instruction CreateFMulFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FMul, V1, V2, FMFSource, Name);
				}
				Instruction CreateFDivFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FDiv, V1, V2, FMFSource, Name);
				}
				Instruction CreateFRemFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FRem, V1, V2, FMFSource, Name);
				}
				Instruction CreateFNegFMF(Value Op, Instruction *FMFSource,
				const Twine &Name = "") {
				Value *Zero = ConstantFP::getNegativeZero(Op->getType());
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FSub, Zero, Op, FMFSource);
				}

				// TODO predicated casts
				template<typename IRBuilderType>
				Value CreateFPTrunc(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPTrunc(V, DestTy, Name); }
				template<typename IRBuilderType>
				Value CreateFPExt(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPExt(V, DestTy, Name); }
				};

				}

				} // namespace llvm

				#endif // LLVM_IR_VPBUILDER_H

llvm/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	void initializeEarlyMachineLICMPass(PassRegistry&);			void initializeEarlyMachineLICMPass(PassRegistry&);
	void initializeEarlyTailDuplicatePass(PassRegistry&);			void initializeEarlyTailDuplicatePass(PassRegistry&);
	void initializeEdgeBundlesPass(PassRegistry&);			void initializeEdgeBundlesPass(PassRegistry&);
	void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);			void initializeEliminateAvailableExternallyLegacyPassPass(PassRegistry&);
	void initializeEntryExitInstrumenterPass(PassRegistry&);			void initializeEntryExitInstrumenterPass(PassRegistry&);
	void initializeExpandMemCmpPassPass(PassRegistry&);			void initializeExpandMemCmpPassPass(PassRegistry&);
	void initializeExpandPostRAPass(PassRegistry&);			void initializeExpandPostRAPass(PassRegistry&);
	void initializeExpandReductionsPass(PassRegistry&);			void initializeExpandReductionsPass(PassRegistry&);
				void initializeExpandVectorPredicationPass(PassRegistry&);
	void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&);			void initializeMakeGuardsExplicitLegacyPassPass(PassRegistry&);
	void initializeExternalAAWrapperPassPass(PassRegistry&);			void initializeExternalAAWrapperPassPass(PassRegistry&);
	void initializeFEntryInserterPass(PassRegistry&);			void initializeFEntryInserterPass(PassRegistry&);
	void initializeFinalizeISelPass(PassRegistry&);			void initializeFinalizeISelPass(PassRegistry&);
	void initializeFinalizeMachineBundlesPass(PassRegistry&);			void initializeFinalizeMachineBundlesPass(PassRegistry&);
	void initializeFlattenCFGPassPass(PassRegistry&);			void initializeFlattenCFGPassPass(PassRegistry&);
	void initializeFloat2IntLegacyPassPass(PassRegistry&);			void initializeFloat2IntLegacyPassPass(PassRegistry&);
	void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);			void initializeForceFunctionAttrsLegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

llvm/include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
]>;		]>;
def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem		def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
]>;		]>;
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix		def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
]>;		]>;

		def SDTIntBinOpVP : SDTypeProfile<1, 4, [ // vp_add, vp_and, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTIntShiftOpVP : SDTypeProfile<1, 4, [ // shl, sra, srl
		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisInt<2>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;

def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.		def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
Show All 29 Lines	def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg
SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,
SDTCisVTSmallerThanOp<2, 1>		SDTCisVTSmallerThanOp<2, 1>
]>;		]>;
def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec		def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec
SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,		SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,
SDTCisOpSmallerThanOp<1, 0>		SDTCisOpSmallerThanOp<1, 0>
]>;		]>;

		def SDTFPUnOpVP : SDTypeProfile<1, 3, [ // vp_fneg, etc.
		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisInt<3>, SDTCisSameNumEltsAs<0, 2>
		]>;
		def SDTFPBinOpVP : SDTypeProfile<1, 4, [ // vp_fadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTFPTernaryOpVP : SDTypeProfile<1, 5, [ // vp_fmadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSetCC : SDTypeProfile<1, 3, [ // setcc		def SDTSetCC : SDTypeProfile<1, 3, [ // setcc
SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>		SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>
]>;		]>;

def SDTSelect : SDTypeProfile<1, 3, [ // select		def SDTSelect : SDTypeProfile<1, 3, [ // select
SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>		SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>
]>;		]>;

def SDTVSelect : SDTypeProfile<1, 3, [ // vselect		def SDTVSelect : SDTypeProfile<1, 3, [ // vselect
SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;

		def SDTVSelectVP : SDTypeProfile<1, 5, [ // vp_vselect
		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc		def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc
SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,		SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,
SDTCisVT<5, OtherVT>		SDTCisVT<5, OtherVT>
]>;		]>;

def SDTBr : SDTypeProfile<0, 1, [ // br		def SDTBr : SDTypeProfile<0, 1, [ // br
SDTCisVT<0, OtherVT>		SDTCisVT<0, OtherVT>
]>;		]>;
Show All 27 Lines
def SDTIStore : SDTypeProfile<1, 3, [ // indexed store		def SDTIStore : SDTypeProfile<1, 3, [ // indexed store
SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>		SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>
]>;		]>;

def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store		def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTStoreVP: SDTypeProfile<0, 4, [ // evl store
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>
		]>;

def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load		def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,
SDTCisSameNumEltsAs<0, 2>		SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTLoadVP : SDTypeProfile<1, 3, [ // evl load
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>,
		SDTCisSameNumEltsAs<0, 2>
		]>;

def SDTVecShuffle : SDTypeProfile<1, 2, [		def SDTVecShuffle : SDTypeProfile<1, 2, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>		SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>
]>;		]>;
def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract		def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract
SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>		SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>
]>;		]>;
def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert		def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert
SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>		SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>
▲ Show 20 Lines • Show All 142 Lines • ▼ Show 20 Lines	def smin : SDNode<"ISD::SMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,		def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,		def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,		def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;

		def vp_and : SDNode<"ISD::VP_AND" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_or : SDNode<"ISD::VP_OR" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_xor : SDNode<"ISD::VP_XOR" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_srl : SDNode<"ISD::VP_SRL" , SDTIntShiftOpVP>;
		def vp_sra : SDNode<"ISD::VP_SRA" , SDTIntShiftOpVP>;
		def vp_shl : SDNode<"ISD::VP_SHL" , SDTIntShiftOpVP>;

		def vp_add : SDNode<"ISD::VP_ADD" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_sub : SDNode<"ISD::VP_SUB" , SDTIntBinOpVP>;
		def vp_mul : SDNode<"ISD::VP_MUL" , SDTIntBinOpVP,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_sdiv : SDNode<"ISD::VP_SDIV" , SDTIntBinOpVP>;
		def vp_udiv : SDNode<"ISD::VP_UDIV" , SDTIntBinOpVP>;
		def vp_srem : SDNode<"ISD::VP_SREM" , SDTIntBinOpVP>;
		def vp_urem : SDNode<"ISD::VP_UREM" , SDTIntBinOpVP>;

def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;		def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;
def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;		def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;

def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;		def smulfixsat : SDNode<"ISD::SMULFIXSAT", SDTIntScaledBinOp, [SDNPCommutative]>;
def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines
def llround : SDNode<"ISD::LLROUND" , SDTFPToIntOp>;		def llround : SDNode<"ISD::LLROUND" , SDTFPToIntOp>;
def lrint : SDNode<"ISD::LRINT" , SDTFPToIntOp>;		def lrint : SDNode<"ISD::LRINT" , SDTFPToIntOp>;
def llrint : SDNode<"ISD::LLRINT" , SDTFPToIntOp>;		def llrint : SDNode<"ISD::LLRINT" , SDTFPToIntOp>;

def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;		def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;
def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;		def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;
def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;		def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;

		def vp_fneg : SDNode<"ISD::VP_FNEG" , SDTFPUnOpVP>;
		def vp_fadd : SDNode<"ISD::VP_FADD" , SDTFPBinOpVP, [SDNPCommutative]>;
		def vp_fsub : SDNode<"ISD::VP_FSUB" , SDTFPBinOpVP>;
		def vp_fmul : SDNode<"ISD::VP_FMUL" , SDTFPBinOpVP, [SDNPCommutative]>;
		def vp_fdiv : SDNode<"ISD::VP_FDIV" , SDTFPBinOpVP>;
		def vp_frem : SDNode<"ISD::VP_FREM" , SDTFPBinOpVP>;
		def vp_fma : SDNode<"ISD::VP_FMA" , SDTFPTernaryOpVP>;

def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;		def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;
def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;		def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;
def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;		def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;
def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;		def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;
def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;		def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;
def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;		def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;

def strict_fadd : SDNode<"ISD::STRICT_FADD",		def strict_fadd : SDNode<"ISD::STRICT_FADD",
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines
def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,		def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

def masked_st : SDNode<"ISD::MSTORE", SDTMaskedStore,		def masked_st : SDNode<"ISD::MSTORE", SDTMaskedStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def masked_ld : SDNode<"ISD::MLOAD", SDTMaskedLoad,		def masked_ld : SDNode<"ISD::MLOAD", SDTMaskedLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

		def vp_store : SDNode<"ISD::VP_STORE", SDTStoreVP,
		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
		def vp_load : SDNode<"ISD::VP_LOAD", SDTLoadVP,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

// Do not use ld, st directly. Use load, extload, sextload, zextload, store,		// Do not use ld, st directly. Use load, extload, sextload, zextload, store,
// and truncst (see below).		// and truncst (see below).
def ld : SDNode<"ISD::LOAD" , SDTLoad,		def ld : SDNode<"ISD::LOAD" , SDTLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def st : SDNode<"ISD::STORE" , SDTStore,		def st : SDNode<"ISD::STORE" , SDTStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def ist : SDNode<"ISD::STORE" , SDTIStore,		def ist : SDNode<"ISD::STORE" , SDTIStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
▲ Show 20 Lines • Show All 975 Lines • Show Last 20 Lines

llvm/lib/Analysis/InstructionSimplify.cpp

Show All 31 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <algorithm>		#include <algorithm>
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instsimplify"		#define DEBUG_TYPE "instsimplify"

▲ Show 20 Lines • Show All 4,501 Lines • ▼ Show 20 Lines	if (FMF.noSignedZeros() && FMF.allowReassoc() &&
match(Op1, m_FSub(m_Value(X), m_Specific(Op0)))))		match(Op1, m_FSub(m_Value(X), m_Specific(Op0)))))
return X;		return X;

return nullptr;		return nullptr;
}		}

/// Given operands for an FSub, see if we can fold the result. If not, this		/// Given operands for an FSub, see if we can fold the result. If not, this
/// returns null.		/// returns null.
static Value SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,		template<typename MatchContext>
const SimplifyQuery &Q, unsigned MaxRecurse) {		static Value SimplifyFSubInstGeneric(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, unsigned MaxRecurse, MatchContext & MC) {

if (Constant *C = foldOrCommuteConstant(Instruction::FSub, Op0, Op1, Q))		if (Constant *C = foldOrCommuteConstant(Instruction::FSub, Op0, Op1, Q))
return C;		return C;

if (Constant *C = simplifyFPOp({Op0, Op1}))		if (Constant *C = simplifyFPOp({Op0, Op1}))
return C;		return C;

// fsub X, +0 ==> X		// fsub X, +0 ==> X
if (match(Op1, m_PosZeroFP()))		if (MC.try_match(Op1, m_PosZeroFP()))
return Op0;		return Op0;

// fsub X, -0 ==> X, when we know X is not -0		// fsub X, -0 ==> X, when we know X is not -0
if (match(Op1, m_NegZeroFP()) &&		if (MC.try_match(Op1, m_NegZeroFP()) &&
(FMF.noSignedZeros() \|\| CannotBeNegativeZero(Op0, Q.TLI)))		(FMF.noSignedZeros() \|\| CannotBeNegativeZero(Op0, Q.TLI)))
return Op0;		return Op0;

// fsub -0.0, (fsub -0.0, X) ==> X		// fsub -0.0, (fsub -0.0, X) ==> X
// fsub -0.0, (fneg X) ==> X		// fsub -0.0, (fneg X) ==> X
Value *X;		Value *X;
if (match(Op0, m_NegZeroFP()) &&		if (MC.try_match(Op0, m_NegZeroFP()) &&
match(Op1, m_FNeg(m_Value(X))))		MC.try_match(Op1, m_FNeg(m_Value(X))))
return X;		return X;

// fsub 0.0, (fsub 0.0, X) ==> X if signed zeros are ignored.		// fsub 0.0, (fsub 0.0, X) ==> X if signed zeros are ignored.
// fsub 0.0, (fneg X) ==> X if signed zeros are ignored.		// fsub 0.0, (fneg X) ==> X if signed zeros are ignored.
if (FMF.noSignedZeros() && match(Op0, m_AnyZeroFP()) &&		if (FMF.noSignedZeros() && match(Op0, m_AnyZeroFP()) &&
(match(Op1, m_FSub(m_AnyZeroFP(), m_Value(X))) \|\|		(MC.try_match(Op1, m_FSub(m_AnyZeroFP(), m_Value(X))) \|\|
match(Op1, m_FNeg(m_Value(X)))))		MC.try_match(Op1, m_FNeg(m_Value(X)))))
return X;		return X;

// fsub nnan x, x ==> 0.0		// fsub nnan x, x ==> 0.0
if (FMF.noNaNs() && Op0 == Op1)		if (FMF.noNaNs() && Op0 == Op1)
return Constant::getNullValue(Op0->getType());		return Constant::getNullValue(Op0->getType());

// Y - (Y - X) --> X		// Y - (Y - X) --> X
// (X + Y) - Y --> X		// (X + Y) - Y --> X
if (FMF.noSignedZeros() && FMF.allowReassoc() &&		if (FMF.noSignedZeros() && FMF.allowReassoc() &&
(match(Op1, m_FSub(m_Specific(Op0), m_Value(X))) \|\|		(MC.try_match(Op1, m_FSub(m_Specific(Op0), m_Value(X))) \|\|
match(Op0, m_c_FAdd(m_Specific(Op1), m_Value(X)))))		MC.try_match(Op0, m_c_FAdd(m_Specific(Op1), m_Value(X)))))
return X;		return X;

return nullptr;		return nullptr;
}		}

static Value SimplifyFMAFMul(Value Op0, Value *Op1, FastMathFlags FMF,		static Value SimplifyFMAFMul(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q, unsigned MaxRecurse) {		const SimplifyQuery &Q, unsigned MaxRecurse) {
if (Constant *C = simplifyFPOp({Op0, Op1}))		if (Constant *C = simplifyFPOp({Op0, Op1}))
Show All 38 Lines
}		}

Value llvm::SimplifyFAddInst(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFAddInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q) {		const SimplifyQuery &Q) {
return ::SimplifyFAddInst(Op0, Op1, FMF, Q, RecursionLimit);		return ::SimplifyFAddInst(Op0, Op1, FMF, Q, RecursionLimit);
}		}



		/// Given operands for an FSub, see if we can fold the result.
		static Value SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, unsigned MaxRecurse) {
		if (Constant *C = foldOrCommuteConstant(Instruction::FSub, Op0, Op1, Q))
		return C;

		EmptyContext EC;
		return SimplifyFSubInstGeneric<EmptyContext>(Op0, Op1, FMF, Q, RecursionLimit, EC);
		}

Value llvm::SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q) {		const SimplifyQuery &Q) {
return ::SimplifyFSubInst(Op0, Op1, FMF, Q, RecursionLimit);		// Now apply simplifications that do not require rounding.
		return SimplifyFSubInst(Op0, Op1, FMF, Q, RecursionLimit);
		}

		Value llvm::SimplifyPredicatedFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, PredicatedContext & PC) {
		return ::SimplifyFSubInstGeneric<PredicatedContext>(Op0, Op1, FMF, Q, RecursionLimit, PC);
}		}

Value llvm::SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q) {		const SimplifyQuery &Q) {
return ::SimplifyFMulInst(Op0, Op1, FMF, Q, RecursionLimit);		return ::SimplifyFMulInst(Op0, Op1, FMF, Q, RecursionLimit);
}		}

Value llvm::SimplifyFMAFMul(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFMAFMul(Value Op0, Value *Op1, FastMathFlags FMF,
▲ Show 20 Lines • Show All 574 Lines • ▼ Show 20 Lines	for (auto &Arg : Call->args()) {
if (!C)		if (!C)
return nullptr;		return nullptr;
ConstantArgs.push_back(C);		ConstantArgs.push_back(C);
}		}

return ConstantFoldCall(Call, F, ConstantArgs, Q.TLI);		return ConstantFoldCall(Call, F, ConstantArgs, Q.TLI);
}		}

		Value *llvm::SimplifyVPIntrinsic(VPIntrinsic & VPInst, const SimplifyQuery &Q) {
		PredicatedContext PC(&VPInst);

		auto & PI = cast<PredicatedInstruction>(VPInst);
		switch (PI.getOpcode()) {
		default:
		return nullptr;

		case Instruction::FSub: return SimplifyPredicatedFSubInst(VPInst.getOperand(0), VPInst.getOperand(1), VPInst.getFastMathFlags(), Q, PC);
		}
		}

/// See if we can compute a simplified version of this instruction.		/// See if we can compute a simplified version of this instruction.
/// If not, this returns null.		/// If not, this returns null.

Value llvm::SimplifyInstruction(Instruction I, const SimplifyQuery &SQ,		Value llvm::SimplifyInstruction(Instruction I, const SimplifyQuery &SQ,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
const SimplifyQuery Q = SQ.CxtI ? SQ : SQ.getWithInstruction(I);		const SimplifyQuery Q = SQ.CxtI ? SQ : SQ.getWithInstruction(I);
Value *Result;		Value *Result;

switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
Result = ConstantFoldInstruction(I, Q.DL, Q.TLI);		Result = ConstantFoldInstruction(I, Q.DL, Q.TLI);
▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {
Result = SimplifyShuffleVectorInst(SVI->getOperand(0), SVI->getOperand(1),		Result = SimplifyShuffleVectorInst(SVI->getOperand(0), SVI->getOperand(1),
SVI->getMask(), SVI->getType(), Q);		SVI->getMask(), SVI->getType(), Q);
break;		break;
}		}
case Instruction::PHI:		case Instruction::PHI:
Result = SimplifyPHINode(cast<PHINode>(I), Q);		Result = SimplifyPHINode(cast<PHINode>(I), Q);
break;		break;
case Instruction::Call: {		case Instruction::Call: {
		auto * VPInst = dyn_cast<VPIntrinsic>(I);
		if (VPInst) {
		Result = SimplifyVPIntrinsic(*VPInst, Q);
		if (Result) break;
		}

		CallSite CS((I));
Result = SimplifyCall(cast<CallInst>(I), Q);		Result = SimplifyCall(cast<CallInst>(I), Q);
break;		break;
}		}
#define HANDLE_CAST_INST(num, opc, clas) case Instruction::opc:		#define HANDLE_CAST_INST(num, opc, clas) case Instruction::opc:
#include "llvm/IR/Instruction.def"		#include "llvm/IR/Instruction.def"
#undef HANDLE_CAST_INST		#undef HANDLE_CAST_INST
Result =		Result =
SimplifyCastInst(I->getOpcode(), I->getOperand(0), I->getType(), Q);		SimplifyCastInst(I->getOpcode(), I->getOperand(0), I->getType(), Q);
▲ Show 20 Lines • Show All 137 Lines • Show Last 20 Lines

llvm/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 824 Lines • ▼ Show 20 Lines

	unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,			unsigned TargetTransformInfo::getStoreVectorFactor(unsigned VF,
	unsigned StoreSize,			unsigned StoreSize,
	unsigned ChainSizeInBytes,			unsigned ChainSizeInBytes,
	VectorType *VecTy) const {			VectorType *VecTy) const {
	return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);			return TTIImpl->getStoreVectorFactor(VF, StoreSize, ChainSizeInBytes, VecTy);
	}			}

				bool TargetTransformInfo::shouldFoldVectorLengthIntoMask(const PredicatedInstruction &PI) const {
				return TTIImpl->shouldFoldVectorLengthIntoMask(PI);
				}

				bool TargetTransformInfo::supportsVPOperation(const PredicatedInstruction &PI) const {
				return TTIImpl->supportsVPOperation(PI);
				}

	bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,			bool TargetTransformInfo::useReductionIntrinsic(unsigned Opcode,
	Type *Ty, ReductionFlags Flags) const {			Type *Ty, ReductionFlags Flags) const {
	return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);			return TTIImpl->useReductionIntrinsic(Opcode, Ty, Flags);
	}			}

	bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {			bool TargetTransformInfo::shouldExpandReduction(const IntrinsicInst *II) const {
	return TTIImpl->shouldExpandReduction(II);			return TTIImpl->shouldExpandReduction(II);
	}			}
	▲ Show 20 Lines • Show All 547 Lines • Show Last 20 Lines

llvm/lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 639 Lines • ▼ Show 20 Lines	#define KEYWORD(STR) \
KEYWORD(convergent);		KEYWORD(convergent);
KEYWORD(dereferenceable);		KEYWORD(dereferenceable);
KEYWORD(dereferenceable_or_null);		KEYWORD(dereferenceable_or_null);
KEYWORD(inaccessiblememonly);		KEYWORD(inaccessiblememonly);
KEYWORD(inaccessiblemem_or_argmemonly);		KEYWORD(inaccessiblemem_or_argmemonly);
KEYWORD(inlinehint);		KEYWORD(inlinehint);
KEYWORD(inreg);		KEYWORD(inreg);
KEYWORD(jumptable);		KEYWORD(jumptable);
		KEYWORD(mask);
KEYWORD(minsize);		KEYWORD(minsize);
KEYWORD(naked);		KEYWORD(naked);
KEYWORD(nest);		KEYWORD(nest);
KEYWORD(noalias);		KEYWORD(noalias);
KEYWORD(nobuiltin);		KEYWORD(nobuiltin);
KEYWORD(nocapture);		KEYWORD(nocapture);
KEYWORD(noduplicate);		KEYWORD(noduplicate);
KEYWORD(nofree);		KEYWORD(nofree);
KEYWORD(noimplicitfloat);		KEYWORD(noimplicitfloat);
KEYWORD(noinline);		KEYWORD(noinline);
KEYWORD(norecurse);		KEYWORD(norecurse);
KEYWORD(nonlazybind);		KEYWORD(nonlazybind);
KEYWORD(nonnull);		KEYWORD(nonnull);
KEYWORD(noredzone);		KEYWORD(noredzone);
KEYWORD(noreturn);		KEYWORD(noreturn);
KEYWORD(nosync);		KEYWORD(nosync);
KEYWORD(nocf_check);		KEYWORD(nocf_check);
KEYWORD(nounwind);		KEYWORD(nounwind);
KEYWORD(optforfuzzing);		KEYWORD(optforfuzzing);
KEYWORD(optnone);		KEYWORD(optnone);
KEYWORD(optsize);		KEYWORD(optsize);
		KEYWORD(passthru);
KEYWORD(readnone);		KEYWORD(readnone);
KEYWORD(readonly);		KEYWORD(readonly);
KEYWORD(returned);		KEYWORD(returned);
KEYWORD(returns_twice);		KEYWORD(returns_twice);
KEYWORD(signext);		KEYWORD(signext);
KEYWORD(speculatable);		KEYWORD(speculatable);
KEYWORD(sret);		KEYWORD(sret);
KEYWORD(ssp);		KEYWORD(ssp);
KEYWORD(sspreq);		KEYWORD(sspreq);
KEYWORD(sspstrong);		KEYWORD(sspstrong);
KEYWORD(strictfp);		KEYWORD(strictfp);
KEYWORD(safestack);		KEYWORD(safestack);
KEYWORD(shadowcallstack);		KEYWORD(shadowcallstack);
KEYWORD(sanitize_address);		KEYWORD(sanitize_address);
KEYWORD(sanitize_hwaddress);		KEYWORD(sanitize_hwaddress);
KEYWORD(sanitize_memtag);		KEYWORD(sanitize_memtag);
KEYWORD(sanitize_thread);		KEYWORD(sanitize_thread);
KEYWORD(sanitize_memory);		KEYWORD(sanitize_memory);
KEYWORD(speculative_load_hardening);		KEYWORD(speculative_load_hardening);
KEYWORD(swifterror);		KEYWORD(swifterror);
KEYWORD(swiftself);		KEYWORD(swiftself);
KEYWORD(uwtable);		KEYWORD(uwtable);
KEYWORD(willreturn);		KEYWORD(willreturn);
		KEYWORD(vlen);
KEYWORD(writeonly);		KEYWORD(writeonly);
KEYWORD(zeroext);		KEYWORD(zeroext);
KEYWORD(immarg);		KEYWORD(immarg);

KEYWORD(type);		KEYWORD(type);
KEYWORD(opaque);		KEYWORD(opaque);

KEYWORD(comdat);		KEYWORD(comdat);
▲ Show 20 Lines • Show All 447 Lines • Show Last 20 Lines

llvm/lib/AsmParser/LLParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,332 Lines • ▼ Show 20 Lines	case lltok::kw_zeroext:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of attribute on a function");		"invalid use of attribute on a function");
break;		break;
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_dereferenceable:		case lltok::kw_dereferenceable:
case lltok::kw_dereferenceable_or_null:		case lltok::kw_dereferenceable_or_null:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_noalias:		case lltok::kw_noalias:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
case lltok::kw_nonnull:		case lltok::kw_nonnull:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
case lltok::kw_immarg:		case lltok::kw_immarg:
		case lltok::kw_vlen:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of parameter-only attribute on a function");		"invalid use of parameter-only attribute on a function");
break;		break;
}		}

Lex.Lex();		Lex.Lex();
}		}
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	case lltok::kw_dereferenceable_or_null: {
uint64_t Bytes;		uint64_t Bytes;
if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))		if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))
return true;		return true;
B.addDereferenceableOrNullAttr(Bytes);		B.addDereferenceableOrNullAttr(Bytes);
continue;		continue;
}		}
case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;		case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;
case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;		case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;
		case lltok::kw_mask: B.addAttribute(Attribute::Mask); break;
case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;		case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;		case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;
case lltok::kw_nofree: B.addAttribute(Attribute::NoFree); break;		case lltok::kw_nofree: B.addAttribute(Attribute::NoFree); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
		case lltok::kw_passthru: B.addAttribute(Attribute::Passthru); break;
case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;		case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;
case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;		case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;
case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;		case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;		case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;
case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;		case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;
case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;		case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;
		case lltok::kw_vlen: B.addAttribute(Attribute::VectorLength); break;
case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;		case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;
case lltok::kw_immarg: B.addAttribute(Attribute::ImmArg); break;		case lltok::kw_immarg: B.addAttribute(Attribute::ImmArg); break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	while (true) {
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;

// Error handling.		// Error handling.
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
case lltok::kw_immarg:		case lltok::kw_immarg:
		case lltok::kw_vlen:
HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");		HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");
break;		break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
case lltok::kw_cold:		case lltok::kw_cold:
▲ Show 20 Lines • Show All 1,652 Lines • ▼ Show 20 Lines	if (Opc == Instruction::FCmp) {
if (!Val0->getType()->isIntOrIntVectorTy() &&		if (!Val0->getType()->isIntOrIntVectorTy() &&
!Val0->getType()->isPtrOrPtrVectorTy())		!Val0->getType()->isPtrOrPtrVectorTy())
return Error(ID.Loc, "icmp requires pointer or integer operands");		return Error(ID.Loc, "icmp requires pointer or integer operands");
ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);		ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);
}		}
ID.Kind = ValID::t_Constant;		ID.Kind = ValID::t_Constant;
return false;		return false;
}		}

// Unary Operators.		// Unary Operators.
case lltok::kw_fneg:		case lltok::kw_fneg:
case lltok::kw_freeze: {		case lltok::kw_freeze: {
unsigned Opc = Lex.getUIntVal();		unsigned Opc = Lex.getUIntVal();
Constant *Val;		Constant *Val;
Lex.Lex();		Lex.Lex();
if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|		if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|
ParseGlobalTypeAndValue(Val) \|\|		ParseGlobalTypeAndValue(Val) \|\|
ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))		ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))
return true;		return true;

// Check that the type is valid for the operator.		// Check that the type is valid for the operator.
switch (Opc) {		switch (Opc) {
case Instruction::FNeg:		case Instruction::FNeg:
if (!Val->getType()->isFPOrFPVectorTy())		if (!Val->getType()->isFPOrFPVectorTy())
return Error(ID.Loc, "constexpr requires fp operands");		return Error(ID.Loc, "constexpr requires fp operands");
break;		break;
case Instruction::Freeze:		case Instruction::Freeze:
break;		break;
▲ Show 20 Lines • Show All 1,321 Lines • ▼ Show 20 Lines
/// ParseDICommonBlock:		/// ParseDICommonBlock:
/// ::= !DICommonBlock(scope: !0, file: !2, name: "COMMON name", line: 9)		/// ::= !DICommonBlock(scope: !0, file: !2, name: "COMMON name", line: 9)
bool LLParser::ParseDICommonBlock(MDNode *&Result, bool IsDistinct) {		bool LLParser::ParseDICommonBlock(MDNode *&Result, bool IsDistinct) {
#define VISIT_MD_FIELDS(OPTIONAL, REQUIRED) \		#define VISIT_MD_FIELDS(OPTIONAL, REQUIRED) \
REQUIRED(scope, MDField, ); \		REQUIRED(scope, MDField, ); \
OPTIONAL(declaration, MDField, ); \		OPTIONAL(declaration, MDField, ); \
OPTIONAL(name, MDStringField, ); \		OPTIONAL(name, MDStringField, ); \
OPTIONAL(file, MDField, ); \		OPTIONAL(file, MDField, ); \
OPTIONAL(line, LineField, );		OPTIONAL(line, LineField, );
PARSE_MD_FIELDS();		PARSE_MD_FIELDS();
#undef VISIT_MD_FIELDS		#undef VISIT_MD_FIELDS

Result = GET_OR_DISTINCT(DICommonBlock,		Result = GET_OR_DISTINCT(DICommonBlock,
(Context, scope.Val, declaration.Val, name.Val,		(Context, scope.Val, declaration.Val, name.Val,
file.Val, line.Val));		file.Val, line.Val));
return false;		return false;
}		}
▲ Show 20 Lines • Show All 4,111 Lines • Show Last 20 Lines

llvm/lib/AsmParser/LLToken.h

Show First 20 Lines • Show All 185 Lines • ▼ Show 20 Lines	enum Kind {
kw_convergent,		kw_convergent,
kw_dereferenceable,		kw_dereferenceable,
kw_dereferenceable_or_null,		kw_dereferenceable_or_null,
kw_inaccessiblememonly,		kw_inaccessiblememonly,
kw_inaccessiblemem_or_argmemonly,		kw_inaccessiblemem_or_argmemonly,
kw_inlinehint,		kw_inlinehint,
kw_inreg,		kw_inreg,
kw_jumptable,		kw_jumptable,
		kw_mask,
kw_minsize,		kw_minsize,
kw_naked,		kw_naked,
kw_nest,		kw_nest,
kw_noalias,		kw_noalias,
kw_nobuiltin,		kw_nobuiltin,
kw_nocapture,		kw_nocapture,
kw_noduplicate,		kw_noduplicate,
kw_nofree,		kw_nofree,
kw_noimplicitfloat,		kw_noimplicitfloat,
kw_noinline,		kw_noinline,
kw_norecurse,		kw_norecurse,
kw_nonlazybind,		kw_nonlazybind,
kw_nonnull,		kw_nonnull,
kw_noredzone,		kw_noredzone,
kw_noreturn,		kw_noreturn,
kw_nosync,		kw_nosync,
kw_nocf_check,		kw_nocf_check,
kw_nounwind,		kw_nounwind,
kw_optforfuzzing,		kw_optforfuzzing,
kw_optnone,		kw_optnone,
kw_optsize,		kw_optsize,
		kw_passthru,
kw_readnone,		kw_readnone,
kw_readonly,		kw_readonly,
kw_returned,		kw_returned,
kw_returns_twice,		kw_returns_twice,
kw_signext,		kw_signext,
kw_speculatable,		kw_speculatable,
kw_ssp,		kw_ssp,
kw_sspreq,		kw_sspreq,
kw_sspstrong,		kw_sspstrong,
kw_safestack,		kw_safestack,
kw_shadowcallstack,		kw_shadowcallstack,
kw_sret,		kw_sret,
kw_sanitize_thread,		kw_sanitize_thread,
kw_sanitize_memory,		kw_sanitize_memory,
kw_speculative_load_hardening,		kw_speculative_load_hardening,
kw_strictfp,		kw_strictfp,
kw_swifterror,		kw_swifterror,
kw_swiftself,		kw_swiftself,
kw_uwtable,		kw_uwtable,
kw_willreturn,		kw_willreturn,
		kw_vlen,
kw_writeonly,		kw_writeonly,
kw_zeroext,		kw_zeroext,
kw_immarg,		kw_immarg,

kw_type,		kw_type,
kw_opaque,		kw_opaque,

kw_comdat,		kw_comdat,
▲ Show 20 Lines • Show All 231 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 1,290 Lines • ▼ Show 20 Lines	case Attribute::ArgMemOnly:
llvm_unreachable("argmemonly attribute not supported in raw format");		llvm_unreachable("argmemonly attribute not supported in raw format");
break;		break;
case Attribute::AllocSize:		case Attribute::AllocSize:
llvm_unreachable("allocsize not supported in raw format");		llvm_unreachable("allocsize not supported in raw format");
break;		break;
case Attribute::SanitizeMemTag:		case Attribute::SanitizeMemTag:
llvm_unreachable("sanitize_memtag attribute not supported in raw format");		llvm_unreachable("sanitize_memtag attribute not supported in raw format");
break;		break;
		case Attribute::Mask:
		llvm_unreachable("mask attribute not supported in raw format");
		break;
		case Attribute::VectorLength:
		llvm_unreachable("vlen attribute not supported in raw format");
		break;
		case Attribute::Passthru:
		llvm_unreachable("passthru attribute not supported in raw format");
		break;
}		}
llvm_unreachable("Unsupported attribute type");		llvm_unreachable("Unsupported attribute type");
}		}

static void addRawAttributeValue(AttrBuilder &B, uint64_t Val) {		static void addRawAttributeValue(AttrBuilder &B, uint64_t Val) {
if (!Val) return;		if (!Val) return;

for (Attribute::AttrKind I = Attribute::None; I != Attribute::EndAttrKinds;		for (Attribute::AttrKind I = Attribute::None; I != Attribute::EndAttrKinds;
I = Attribute::AttrKind(I + 1)) {		I = Attribute::AttrKind(I + 1)) {
if (I == Attribute::SanitizeMemTag \|\|		if (I == Attribute::SanitizeMemTag \|\|
I == Attribute::Dereferenceable \|\|		I == Attribute::Dereferenceable \|\|
I == Attribute::DereferenceableOrNull \|\|		I == Attribute::DereferenceableOrNull \|\|
I == Attribute::ArgMemOnly \|\|		I == Attribute::ArgMemOnly \|\|
I == Attribute::AllocSize \|\|		I == Attribute::AllocSize \|\|
		I == Attribute::Mask \|\|
		I == Attribute::VectorLength \|\|
		I == Attribute::Passthru \|\|
I == Attribute::NoSync)		I == Attribute::NoSync)
continue;		continue;
if (uint64_t A = (Val & getRawAttributeMask(I))) {		if (uint64_t A = (Val & getRawAttributeMask(I))) {
if (I == Attribute::Alignment)		if (I == Attribute::Alignment)
B.addAlignmentAttr(1ULL << ((A >> 16) - 1));		B.addAlignmentAttr(1ULL << ((A >> 16) - 1));
else if (I == Attribute::StackAlignment)		else if (I == Attribute::StackAlignment)
B.addStackAlignmentAttr(1ULL << ((A >> 26)-1));		B.addStackAlignmentAttr(1ULL << ((A >> 26)-1));
else		else
▲ Show 20 Lines • Show All 109 Lines • ▼ Show 20 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:		case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:
return Attribute::InaccessibleMemOrArgMemOnly;		return Attribute::InaccessibleMemOrArgMemOnly;
case bitc::ATTR_KIND_INLINE_HINT:		case bitc::ATTR_KIND_INLINE_HINT:
return Attribute::InlineHint;		return Attribute::InlineHint;
case bitc::ATTR_KIND_IN_REG:		case bitc::ATTR_KIND_IN_REG:
return Attribute::InReg;		return Attribute::InReg;
case bitc::ATTR_KIND_JUMP_TABLE:		case bitc::ATTR_KIND_JUMP_TABLE:
return Attribute::JumpTable;		return Attribute::JumpTable;
		case bitc::ATTR_KIND_MASK:
		return Attribute::Mask;
case bitc::ATTR_KIND_MIN_SIZE:		case bitc::ATTR_KIND_MIN_SIZE:
return Attribute::MinSize;		return Attribute::MinSize;
case bitc::ATTR_KIND_NAKED:		case bitc::ATTR_KIND_NAKED:
return Attribute::Naked;		return Attribute::Naked;
case bitc::ATTR_KIND_NEST:		case bitc::ATTR_KIND_NEST:
return Attribute::Nest;		return Attribute::Nest;
case bitc::ATTR_KIND_NO_ALIAS:		case bitc::ATTR_KIND_NO_ALIAS:
return Attribute::NoAlias;		return Attribute::NoAlias;
Show All 32 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_NO_UNWIND:		case bitc::ATTR_KIND_NO_UNWIND:
return Attribute::NoUnwind;		return Attribute::NoUnwind;
case bitc::ATTR_KIND_OPT_FOR_FUZZING:		case bitc::ATTR_KIND_OPT_FOR_FUZZING:
return Attribute::OptForFuzzing;		return Attribute::OptForFuzzing;
case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:		case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:
return Attribute::OptimizeForSize;		return Attribute::OptimizeForSize;
case bitc::ATTR_KIND_OPTIMIZE_NONE:		case bitc::ATTR_KIND_OPTIMIZE_NONE:
return Attribute::OptimizeNone;		return Attribute::OptimizeNone;
		case bitc::ATTR_KIND_PASSTHRU:
		return Attribute::Passthru;
case bitc::ATTR_KIND_READ_NONE:		case bitc::ATTR_KIND_READ_NONE:
return Attribute::ReadNone;		return Attribute::ReadNone;
case bitc::ATTR_KIND_READ_ONLY:		case bitc::ATTR_KIND_READ_ONLY:
return Attribute::ReadOnly;		return Attribute::ReadOnly;
case bitc::ATTR_KIND_RETURNED:		case bitc::ATTR_KIND_RETURNED:
return Attribute::Returned;		return Attribute::Returned;
case bitc::ATTR_KIND_RETURNS_TWICE:		case bitc::ATTR_KIND_RETURNS_TWICE:
return Attribute::ReturnsTwice;		return Attribute::ReturnsTwice;
Show All 30 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_SWIFT_ERROR:		case bitc::ATTR_KIND_SWIFT_ERROR:
return Attribute::SwiftError;		return Attribute::SwiftError;
case bitc::ATTR_KIND_SWIFT_SELF:		case bitc::ATTR_KIND_SWIFT_SELF:
return Attribute::SwiftSelf;		return Attribute::SwiftSelf;
case bitc::ATTR_KIND_UW_TABLE:		case bitc::ATTR_KIND_UW_TABLE:
return Attribute::UWTable;		return Attribute::UWTable;
case bitc::ATTR_KIND_WILLRETURN:		case bitc::ATTR_KIND_WILLRETURN:
return Attribute::WillReturn;		return Attribute::WillReturn;
		case bitc::ATTR_KIND_VECTORLENGTH:
		return Attribute::VectorLength;
case bitc::ATTR_KIND_WRITEONLY:		case bitc::ATTR_KIND_WRITEONLY:
return Attribute::WriteOnly;		return Attribute::WriteOnly;
case bitc::ATTR_KIND_Z_EXT:		case bitc::ATTR_KIND_Z_EXT:
return Attribute::ZExt;		return Attribute::ZExt;
case bitc::ATTR_KIND_IMMARG:		case bitc::ATTR_KIND_IMMARG:
return Attribute::ImmArg;		return Attribute::ImmArg;
case bitc::ATTR_KIND_SANITIZE_MEMTAG:		case bitc::ATTR_KIND_SANITIZE_MEMTAG:
return Attribute::SanitizeMemTag;		return Attribute::SanitizeMemTag;
▲ Show 20 Lines • Show All 5,178 Lines • Show Last 20 Lines

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 672 Lines • ▼ Show 20 Lines	static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
case Attribute::OptimizeNone:		case Attribute::OptimizeNone:
return bitc::ATTR_KIND_OPTIMIZE_NONE;		return bitc::ATTR_KIND_OPTIMIZE_NONE;
case Attribute::ReadNone:		case Attribute::ReadNone:
return bitc::ATTR_KIND_READ_NONE;		return bitc::ATTR_KIND_READ_NONE;
case Attribute::ReadOnly:		case Attribute::ReadOnly:
return bitc::ATTR_KIND_READ_ONLY;		return bitc::ATTR_KIND_READ_ONLY;
case Attribute::Returned:		case Attribute::Returned:
return bitc::ATTR_KIND_RETURNED;		return bitc::ATTR_KIND_RETURNED;
		case Attribute::Mask:
		return bitc::ATTR_KIND_MASK;
		case Attribute::VectorLength:
		return bitc::ATTR_KIND_VECTORLENGTH;
		case Attribute::Passthru:
		return bitc::ATTR_KIND_PASSTHRU;
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
return bitc::ATTR_KIND_RETURNS_TWICE;		return bitc::ATTR_KIND_RETURNS_TWICE;
case Attribute::SExt:		case Attribute::SExt:
return bitc::ATTR_KIND_S_EXT;		return bitc::ATTR_KIND_S_EXT;
case Attribute::Speculatable:		case Attribute::Speculatable:
return bitc::ATTR_KIND_SPECULATABLE;		return bitc::ATTR_KIND_SPECULATABLE;
case Attribute::StackAlignment:		case Attribute::StackAlignment:
return bitc::ATTR_KIND_STACK_ALIGNMENT;		return bitc::ATTR_KIND_STACK_ALIGNMENT;
▲ Show 20 Lines • Show All 3,985 Lines • Show Last 20 Lines

llvm/lib/CodeGen/CMakeLists.txt

Show All 19 Lines	add_llvm_library(LLVMCodeGen
DFAPacketizer.cpp		DFAPacketizer.cpp
DwarfEHPrepare.cpp		DwarfEHPrepare.cpp
EarlyIfConversion.cpp		EarlyIfConversion.cpp
EdgeBundles.cpp		EdgeBundles.cpp
ExecutionDomainFix.cpp		ExecutionDomainFix.cpp
ExpandMemCmp.cpp		ExpandMemCmp.cpp
ExpandPostRAPseudos.cpp		ExpandPostRAPseudos.cpp
ExpandReductions.cpp		ExpandReductions.cpp
		ExpandVectorPredication.cpp
FaultMaps.cpp		FaultMaps.cpp
FEntryInserter.cpp		FEntryInserter.cpp
FinalizeISel.cpp		FinalizeISel.cpp
FuncletLayout.cpp		FuncletLayout.cpp
GCMetadata.cpp		GCMetadata.cpp
GCMetadataPrinter.cpp		GCMetadataPrinter.cpp
GCRootLowering.cpp		GCRootLowering.cpp
GCStrategy.cpp		GCStrategy.cpp
▲ Show 20 Lines • Show All 152 Lines • Show Last 20 Lines

llvm/lib/CodeGen/ExpandVectorPredication.cpp

This file was added.

				//===--- ExpandVectorPredication.cpp - Expand vector predication intrinsics
				//--===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass implements IR expansion for vector predication intrinsics, allowing
				// targets to enable vector predication until just before codegen.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/CodeGen/ExpandVectorPredication.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/CodeGen/Passes.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/InstIterator.h"
				#include "llvm/IR/Instructions.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Intrinsics.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/PredicatedInst.h"
				#include "llvm/Pass.h"
				#include "llvm/Support/Debug.h"
				#include "llvm/Transforms/Utils/LoopUtils.h"

				using namespace llvm;

				#define DEBUG_TYPE "expand-vec-pred"

				STATISTIC(NumFoldedVL, "Number of folded vector length params");
				STATISTIC(numLoweredVPOps, "Number of folded vector predication operations");

				namespace {

				/// \brief The logical vector element size of this operation.
				int32_t GetFunctionalVectorElementSize() {
				return 64; // TODO infer from operation (eg
				// VPIntrinsic::getVectorElementSize())
				}

				/// \returns A vector with ascending integer indices (<0, 1, ..., NumElems-1>).
				Value *CreateStepVector(IRBuilder<> &Builder, int32_t ElemBits,
				int32_t NumElems) {
				// TODO add caching
				SmallVector<Constant *, 16> ConstElems;

				Type *LaneTy = Builder.getIntNTy(ElemBits);

				for (int32_t Idx = 0; Idx < NumElems; ++Idx) {
				ConstElems.push_back(ConstantInt::get(LaneTy, Idx, false));
				}

				return ConstantVector::get(ConstElems);
				}

				/// \returns A bitmask that is true where the lane position is less-than
				///
				/// \p Builder
				/// Used for instruction creation.
				/// \p VLParam
				/// The explicit vector length parameter to test against the lane
				/// positions.
				// \p ElemBits
				/// Integer bitsize used for the generated ICmp instruction.
				/// \p NumElems
				/// Static vector length of the operation.
				Value ConvertVLToMask(IRBuilder<> &Builder, Value VLParam, int32_t ElemBits,
				int32_t NumElems) {
				// TODO increase elem bits to shrink wrap VLParam where necessary (eg if
				// operating on i8)
				Type *LaneTy = Builder.getIntNTy(ElemBits);

				auto ExtVLParam = Builder.CreateSExt(VLParam, LaneTy);
				auto VLSplat = Builder.CreateVectorSplat(NumElems, ExtVLParam);

				auto IdxVec = CreateStepVector(Builder, ElemBits, NumElems);

				return Builder.CreateICmp(CmpInst::ICMP_ULT, IdxVec, VLSplat);
				}

				/// \returns A non-excepting divisor constant for this type.
				Constant getSafeDivisor(Type DivTy) {
				if (DivTy->isIntOrIntVectorTy()) {
				return Constant::getAllOnesValue(DivTy);
				}
				if (DivTy->isFPOrFPVectorTy()) {
				return ConstantVector::getSplat(
				DivTy->getVectorNumElements(),
				ConstantFP::get(DivTy->getVectorElementType(), 1.0));
				}
				llvm_unreachable("Not a valid type for division");
				}

				/// Transfer operation properties from \p OldVPI to \p NewVal.
				void TransferDecorations(Value NewVal, VPIntrinsic OldVPI) {
				auto NewInst = dyn_cast<Instruction>(NewVal);
				if (!NewInst)
				return;

				if (auto FPMathOp = dyn_cast<FPMathOperator>(OldVPI)) {
				NewInst->setFastMathFlags(FPMathOp->getFastMathFlags());
				}
				}

				/// \brief Lower this vector-predicated operator into standard IR.
				void LowerVPUnaryOperator(VPIntrinsic *VPI) {
				assert(VPI->canIgnoreVectorLengthParam());
				auto OC = VPI->getFunctionalOpcode();
				auto FirstOp = VPI->getOperand(0);
				assert(OC == Instruction::FNeg);
				auto I = cast<Instruction>(VPI);
				IRBuilder<> Builder(I);
				auto NewFNeg = Builder.CreateFNegFMF(FirstOp, I, I->getName());
				I->replaceAllUsesWith(NewFNeg);
				I->eraseFromParent();
				}

				/// \brief Lower this VP binary operator to a non-VP binary operator.
				void LowerVPBinaryOperator(VPIntrinsic *VPI) {
				assert(VPI->canIgnoreVectorLengthParam());
				assert(VPI->isBinaryOp());

				auto OldBinOp = cast<Instruction>(VPI);

				auto FirstOp = VPI->getOperand(0);
				auto SndOp = VPI->getOperand(1);

				IRBuilder<> Builder(OldBinOp);
				auto Mask = VPI->getMaskParam();

				switch (VPI->getFunctionalOpcode()) {
				default:
				// can safely ignore the predicate
				break;

				// Division operators need a safe divisor on masked-off lanes (1.0)
				case Instruction::FDiv:
				case Instruction::FRem:
				case Instruction::UDiv:
				case Instruction::SDiv:
				case Instruction::URem:
				case Instruction::SRem:
				// 2nd operand must not be zero
				auto SafeDivisor = getSafeDivisor(VPI->getType());
				SndOp = Builder.CreateSelect(Mask, SndOp, SafeDivisor);
				}

				auto NewBinOp = Builder.CreateBinOp(
				static_cast<Instruction::BinaryOps>(VPI->getFunctionalOpcode()), FirstOp,
				SndOp, VPI->getName(), nullptr);

				if (auto *NewBinOpInst = cast<Instruction>(NewBinOp)) {
				// transfer FMF flags, wrapping attributes, ..
				TransferDecorations(NewBinOpInst, VPI);
				}

				OldBinOp->replaceAllUsesWith(NewBinOp);
				OldBinOp->eraseFromParent();
				}

				/// \brief Lower llvm.vp.compose.* into a select instruction
				void LowerVPCompose(VPIntrinsic *VPI) {
				auto &I = cast<Instruction>(*VPI);
				auto ElemBits = GetFunctionalVectorElementSize();
				auto NumElems = VPI->getStaticVectorLength();
				assert(NumElems.hasValue() && "TODO scalable vector support");

				IRBuilder<> Builder(cast<Instruction>(VPI));
				auto PivotMask = ConvertVLToMask(Builder, VPI->getOperand(2), ElemBits,
				NumElems.getValue());
				auto NewCompose = Builder.CreateSelect(PivotMask, VPI->getOperand(0),
				VPI->getOperand(1), VPI->getName());
				I.replaceAllUsesWith(NewCompose);
				I.eraseFromParent();
				}

				/// \brief Lower this llvm.vp.fma intrinsic to a llvm.fma intrinsic.
				void LowerVPFMA(VPIntrinsic *VPI) {
				assert(VPI->canIgnoreVectorLengthParam());

				auto I = cast<Instruction>(VPI);
				auto M = I->getParent()->getModule();
				IRBuilder<> Builder(I);
				auto FMAFunc = Intrinsic::getDeclaration(M, Intrinsic::fma, VPI->getType());
				auto NewFMA = Builder.CreateCall(
				FMAFunc, {VPI->getOperand(0), VPI->getOperand(1), VPI->getOperand(2)},
				VPI->getName());
				TransferDecorations(NewFMA, VPI);
				I->replaceAllUsesWith(NewFMA);
				I->eraseFromParent();
				}

				/// \returns Whether the vector mask \p MaskVal has all lane bits set.
				static bool IsAllTrueMask(Value *MaskVal) {
				auto ConstVec = dyn_cast<ConstantVector>(MaskVal);
				if (!ConstVec)
				return false;
				return ConstVec->isAllOnesValue();
				}

				/// \returns The constant \p ConstVal broadcasted to \p VecTy.
				static Value BroadcastConstant(Constant ConstVal, VectorType *VecTy) {
				return ConstantDataVector::getSplat(VecTy->getVectorNumElements(), ConstVal);
				}

				/// \returns The neutral element of the reduction \p VPRedID.
				static Value *GetNeutralElementVector(Intrinsic::ID VPRedID,
				VectorType *VecTy) {
				unsigned ElemBits = VecTy->getScalarSizeInBits();

				switch (VPRedID) {
				default:
				abort(); // invalid vp reduction intrinsic

				case Intrinsic::vp_reduce_add:
				case Intrinsic::vp_reduce_or:
				case Intrinsic::vp_reduce_xor:
				case Intrinsic::vp_reduce_umax:
				return Constant::getNullValue(VecTy);

				case Intrinsic::vp_reduce_mul:
				return BroadcastConstant(
				ConstantInt::get(VecTy->getElementType(), 1, false), VecTy);

				case Intrinsic::vp_reduce_and:
				case Intrinsic::vp_reduce_umin:
				return Constant::getAllOnesValue(VecTy);

				case Intrinsic::vp_reduce_smin:
				return BroadcastConstant(
				ConstantInt::get(VecTy->getContext(),
				APInt::getSignedMaxValue(ElemBits)),
				VecTy);
				case Intrinsic::vp_reduce_smax:
				return BroadcastConstant(
				ConstantInt::get(VecTy->getContext(),
				APInt::getSignedMinValue(ElemBits)),
				VecTy);

				case Intrinsic::vp_reduce_fmin:
				case Intrinsic::vp_reduce_fmax:
				return BroadcastConstant(ConstantFP::getQNaN(VecTy->getElementType()),
				VecTy);
				case Intrinsic::vp_reduce_fadd:
				return BroadcastConstant(ConstantFP::get(VecTy->getElementType(), 0.0),
				VecTy);
				case Intrinsic::vp_reduce_fmul:
				return BroadcastConstant(ConstantFP::get(VecTy->getElementType(), 1.0),
				VecTy);
				}
				}

				/// \brief Lower this llvm.vp.reduce.* intrinsic to a llvm.experimental.reduce.*
				/// intrinsic.
				void LowerVPReduction(VPIntrinsic *VPI) {
				assert(VPI->canIgnoreVectorLengthParam());
				assert(VPI->isReductionOp());

				auto &I = *cast<Instruction>(VPI);
				IRBuilder<> Builder(&I);
				auto M = Builder.GetInsertBlock()->getModule();
				assert(M && "No module to declare reduction intrinsic in!");

				SmallVector<Value *, 3> Args;

				Value *RedVectorParam = VPI->getReductionVectorParam();
				Value *RedAccuParam = VPI->getReductionAccuParam();
				Value *MaskParam = VPI->getMaskParam();
				auto FunctionalID = VPI->getFunctionalIntrinsicID();

				// Insert neutral element in masked-out positions
				bool IsUnmasked = IsAllTrueMask(VPI->getMaskParam());
				if (!IsUnmasked) {
				auto *NeutralVector = GetNeutralElementVector(
				VPI->getIntrinsicID(), cast<VectorType>(RedVectorParam->getType()));
				RedVectorParam =
				Builder.CreateSelect(MaskParam, RedVectorParam, NeutralVector);
				}

				auto VecTypeArg = RedVectorParam->getType();

				Value *NewReduct;
				switch (FunctionalID) {
				default: {
				auto RedIntrinFunc = Intrinsic::getDeclaration(M, FunctionalID, VecTypeArg);
				NewReduct = Builder.CreateCall(RedIntrinFunc, RedVectorParam, I.getName());
				assert(!RedAccuParam && "accu dropped");
				} break;

				case Intrinsic::experimental_vector_reduce_v2_fadd:
				case Intrinsic::experimental_vector_reduce_v2_fmul: {
				auto TypeArg = RedAccuParam->getType();
				auto RedIntrinFunc =
				Intrinsic::getDeclaration(M, FunctionalID, {TypeArg, VecTypeArg});
				NewReduct = Builder.CreateCall(RedIntrinFunc,
				{RedAccuParam, RedVectorParam}, I.getName());
				} break;
				}

				TransferDecorations(NewReduct, VPI);
				I.replaceAllUsesWith(NewReduct);
				I.eraseFromParent();
				}

				/// \brief Lower this llvm.vp.(load\|store\|gather\|scatter) to a non-vp
				/// instruction.
				void LowerVPMemoryIntrinsic(VPIntrinsic *VPI) {
				assert(VPI->canIgnoreVectorLengthParam());
				auto &I = cast<Instruction>(*VPI);

				auto MaskParam = VPI->getMaskParam();
				auto PtrParam = VPI->getMemoryPointerParam();
				auto DataParam = VPI->getMemoryDataParam();
				bool IsUnmasked = IsAllTrueMask(MaskParam);

				IRBuilder<> Builder(&I);
				auto &DL = Builder.GetInsertBlock()->getModule()->getDataLayout();

				Value *NewMemoryInst = nullptr;
				switch (VPI->getIntrinsicID()) {
				default:
				abort(); // not a VP memory intrinsic

				case Intrinsic::vp_store: {
				if (IsUnmasked) {
				NewMemoryInst = Builder.CreateStore(DataParam, PtrParam, false);
				} else {
				Align MayAlign = PtrParam->getPointerAlignment(DL).valueOrOne();
				NewMemoryInst = Builder.CreateMaskedStore(DataParam, PtrParam,
				MayAlign.value(), MaskParam);
				}
				} break;

				case Intrinsic::vp_load: {
				if (IsUnmasked) {
				NewMemoryInst = Builder.CreateLoad(PtrParam, false);
				} else {
				Align MayAlign = PtrParam->getPointerAlignment(DL).valueOrOne();
				NewMemoryInst =
				Builder.CreateMaskedLoad(PtrParam, MayAlign.value(), MaskParam);
				}
				} break;

				case Intrinsic::vp_scatter: {
				if (IsUnmasked) {
				NewMemoryInst = Builder.CreateStore(DataParam, PtrParam, false);
				} else {
				Align MayAlign; // FIXME = PtrParam->getPointerAlignment(DL).valueOrOne();
				NewMemoryInst = Builder.CreateMaskedScatter(DataParam, PtrParam,
				MayAlign.value(), MaskParam);
				}
				} break;

				case Intrinsic::vp_gather: {
				if (IsUnmasked) {
				NewMemoryInst = Builder.CreateLoad(I.getType(), PtrParam, false);
				} else {
				Align MayAlign; // FIXME = PtrParam->getPointerAlignment(DL).valueOrOne();
				NewMemoryInst = Builder.CreateMaskedGather(
				PtrParam, MayAlign.value(), MaskParam, nullptr, I.getName());
				}
				} break;
				}

				assert(NewMemoryInst);
				I.replaceAllUsesWith(NewMemoryInst);
				I.eraseFromParent();
				}

				/// \brief Lower llvm.vp.select.* to a select instruction.
				void LowerVPSelectInst(VPIntrinsic *VPI) {
				auto I = cast<Instruction>(VPI);

				auto NewVal = SelectInst::Create(VPI->getMaskParam(), VPI->getOperand(1),
				VPI->getOperand(2), I->getName(), I, I);
				TransferDecorations(NewVal, VPI);
				I->replaceAllUsesWith(NewVal);
				I->eraseFromParent();
				}

				/// \brief Lower llvm.vp.(icmp\|fcmp) to an icmp or fcmp instruction.
				void LowerVPCompare(VPIntrinsic *VPI) {
				auto NewCmp = CmpInst::Create(
				static_cast<Instruction::OtherOps>(VPI->getFunctionalOpcode()),
				VPI->getCmpPredicate(), VPI->getOperand(0), VPI->getOperand(1),
				VPI->getName(), cast<Instruction>(VPI));
				VPI->replaceAllUsesWith(NewCmp);
				VPI->eraseFromParent();
				}

				/// \brief Lower a llvm.vp.* intrinsic that is not functionally equivalent to a
				/// standard IR instruction.
				void LowerUnmatchedVPIntrinsic(VPIntrinsic *VPI) {
				if (VPI->isReductionOp())
				return LowerVPReduction(VPI);

				switch (VPI->getIntrinsicID()) {
				default:
				abort(); // unexpected intrinsic

				case Intrinsic::vp_compress:
				case Intrinsic::vp_expand:
				case Intrinsic::vp_vshift:
				LLVM_DEBUG(dbgs() << "Silently keeping VP intrinsic: can not substitute: "
				<< *VPI << "\n");
				return;

				case Intrinsic::vp_compose:
				LowerVPCompose(VPI);
				break;

				case Intrinsic::vp_fma:
				LowerVPFMA(VPI);
				break;

				case Intrinsic::vp_gather:
				case Intrinsic::vp_scatter:
				LowerVPMemoryIntrinsic(VPI);
				break;
				}
				}

				/// \brief Expand llvm.vp.* intrinsics as requested by \p TTI.
				bool expandVectorPredication(Function &F, const TargetTransformInfo *TTI) {
				bool Changed = false;

				// Holds all vector-predicated ops with an effective vector length param that
				// needs to be folded into the mask param.
				SmallVector<VPIntrinsic *, 4> ExpandVLWorklist;

				// Holds all vector-predicated ops that need to translated into non-VP ops.
				SmallVector<VPIntrinsic *, 4> ExpandOpWorklist;

				for (auto &I : instructions(F)) {
				auto *VPI = dyn_cast<VPIntrinsic>(&I);
				if (!VPI)
				continue;

				auto &PI = cast<PredicatedInstruction>(*VPI);

				bool supportsVPOp = TTI->supportsVPOperation(PI);
				bool hasEffectiveVLParam = !VPI->canIgnoreVectorLengthParam();
				bool shouldFoldVLParam =
				!supportsVPOp \|\| TTI->shouldFoldVectorLengthIntoMask(PI);

				LLVM_DEBUG(dbgs() << "Inspecting " << VPI
				<< "\n:: target-support=" << supportsVPOp
				<< ", effectiveVecLen=" << hasEffectiveVLParam
				<< ", shouldFoldVecLen=" << shouldFoldVLParam << "\n");

				if (shouldFoldVLParam) {
				if (hasEffectiveVLParam && VPI->getMaskParam()) {
				ExpandVLWorklist.push_back(VPI);
				} else {
				ExpandOpWorklist.push_back(VPI);
				}
				}
				}

				// Fold vector-length params into the mask param.
				LLVM_DEBUG(dbgs() << "\n:::: Folding vlen into mask. ::::\n");
				for (VPIntrinsic *VPI : ExpandVLWorklist) {
				++NumFoldedVL;
				Changed = true;

				LLVM_DEBUG(dbgs() << "Folding vlen for op: " << *VPI << '\n');

				IRBuilder<> Builder(cast<Instruction>(VPI));

				Value *OldMaskParam = VPI->getMaskParam();
				Value *OldVLParam = VPI->getVectorLengthParam();
				assert(OldMaskParam && "no mask param to fold the vl param into");
				assert(OldVLParam && "no vector length param to fold away");

				LLVM_DEBUG(dbgs() << "OLD vlen: " << *OldVLParam << '\n');
				LLVM_DEBUG(dbgs() << "OLD mask: " << *OldMaskParam << '\n');

				// Determine the lane bit size that should be used to lower this op
				auto ElemBits = GetFunctionalVectorElementSize();
				auto NumElems = VPI->getStaticVectorLength();
				assert(NumElems.hasValue() && "TODO scalable vector support");

				// Lower VL to M
				auto *VLMask =
				ConvertVLToMask(Builder, OldVLParam, ElemBits, NumElems.getValue());
				auto NewMaskParam = Builder.CreateAnd(VLMask, OldMaskParam);
				VPI->setMaskParam(
				NewMaskParam); // FIXME cannot trivially use the PI abstraction here.

				// Disable VL
				auto FullVL = Builder.getInt32(NumElems.getValue());
				VPI->setVectorLengthParam(FullVL);
				assert(VPI->canIgnoreVectorLengthParam() &&
				"transformation did not render the vl param ineffective!");

				LLVM_DEBUG(dbgs() << "NEW vlen: " << *FullVL << '\n');
				LLVM_DEBUG(dbgs() << "NEW mask: " << *NewMaskParam << '\n');

				auto &PI = cast<PredicatedInstruction>(*VPI);
				if (!TTI->supportsVPOperation(PI)) {
				ExpandOpWorklist.push_back(VPI);
				}
				}

				// Translate into non-VP ops
				LLVM_DEBUG(dbgs() << "\n:::: Lowering VP into non-VP ops ::::\n");
				for (VPIntrinsic *VPI : ExpandOpWorklist) {
				++numLoweredVPOps;
				Changed = true;

				LLVM_DEBUG(dbgs() << "Lowering vp op: " << *VPI << '\n');

				unsigned OC = VPI->getFunctionalOpcode();
				#define FIRST_UNARY_INST(X) unsigned FirstUnOp = X;
				#define LAST_UNARY_INST(X) unsigned LastUnOp = X;
				#define FIRST_BINARY_INST(X) unsigned FirstBinOp = X;
				#define LAST_BINARY_INST(X) unsigned LastBinOp = X;
				#include "llvm/IR/Instruction.def"

				if (FirstBinOp <= OC && OC <= LastBinOp) {
				LowerVPBinaryOperator(VPI);
				continue;
				}
				if (FirstUnOp <= OC && OC <= LastUnOp) {
				LowerVPUnaryOperator(VPI);
				continue;
				}

				switch (OC) {
				default:
				abort(); // unexpected intrinsic

				case Instruction::Call:
				LowerUnmatchedVPIntrinsic(VPI);
				break;

				case Instruction::Select:
				LowerVPSelectInst(VPI);
				break;

				case Instruction::Store:
				case Instruction::Load:
				LowerVPMemoryIntrinsic(VPI);
				break;

				case Instruction::ICmp:
				case Instruction::FCmp:
				LowerVPCompare(VPI);
				break;
				}
				}

				return Changed;
				}

				class ExpandVectorPredication : public FunctionPass {
				public:
				static char ID;
				ExpandVectorPredication() : FunctionPass(ID) {
				initializeExpandVectorPredicationPass(*PassRegistry::getPassRegistry());
				}

				bool runOnFunction(Function &F) override {
				const auto *TTI = &getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				return expandVectorPredication(F, TTI);
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.setPreservesCFG();
				}
				};
				} // namespace

				char ExpandVectorPredication::ID;
				INITIALIZE_PASS_BEGIN(ExpandVectorPredication, "expand-vec-pred",
				"Expand vector predication intrinsics", false, false)
				INITIALIZE_PASS_DEPENDENCY(TargetTransformInfoWrapperPass)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(ExpandVectorPredication, "expand-vec-pred",
				"Expand vector predication intrinsics", false, false)

				FunctionPass *llvm::createExpandVectorPredicationPass() {
				return new ExpandVectorPredication();
				}

				PreservedAnalyses
				ExpandVectorPredicationPass::run(Function &F, FunctionAnalysisManager &AM) {
				const auto &TTI = AM.getResult<TargetIRAnalysis>(F);
				if (!expandVectorPredication(F, &TTI))
				return PreservedAnalyses::all();
				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				return PA;
				}

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 421 Lines • ▼ Show 20 Lines	private:
SDValue visitAssertExt(SDNode *N);		SDValue visitAssertExt(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
		SDValue visitFADD_VP(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
SDValue visitFREM(SDNode *N);		SDValue visitFREM(SDNode *N);
SDValue visitFSQRT(SDNode *N);		SDValue visitFSQRT(SDNode *N);
SDValue visitFCOPYSIGN(SDNode *N);		SDValue visitFCOPYSIGN(SDNode *N);
SDValue visitFPOW(SDNode *N);		SDValue visitFPOW(SDNode *N);
Show All 32 Lines	private:
SDValue visitMLOAD(SDNode *N);		SDValue visitMLOAD(SDNode *N);
SDValue visitMSTORE(SDNode *N);		SDValue visitMSTORE(SDNode *N);
SDValue visitMGATHER(SDNode *N);		SDValue visitMGATHER(SDNode *N);
SDValue visitMSCATTER(SDNode *N);		SDValue visitMSCATTER(SDNode *N);
SDValue visitFP_TO_FP16(SDNode *N);		SDValue visitFP_TO_FP16(SDNode *N);
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);
SDValue visitVECREDUCE(SDNode *N);		SDValue visitVECREDUCE(SDNode *N);

		template<class MatchContextClass>
SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
bool reassociationCanBreakAddressingModePattern(unsigned Opc,		bool reassociationCanBreakAddressingModePattern(unsigned Opc,
const SDLoc &DL, SDValue N0,		const SDLoc &DL, SDValue N0,
SDValue N1);		SDValue N1);
▲ Show 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	public:
explicit WorklistInserter(DAGCombiner &dc)		explicit WorklistInserter(DAGCombiner &dc)
: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}		: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}

// FIXME: Ideally we could add N to the worklist, but this causes exponential		// FIXME: Ideally we could add N to the worklist, but this causes exponential
// compile time costs in large DAGs, e.g. Halide.		// compile time costs in large DAGs, e.g. Halide.
void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }		void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }
};		};

		struct EmptyMatchContext {
		SelectionDAG & DAG;

		EmptyMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		{}

		bool match(SDValue OpN, unsigned OpCode) const { return OpCode == OpN->getOpcode(); }

		unsigned getFunctionOpCode(SDValue N) const {
		return N->getOpcode();
		}

		bool isCompatible(SDValue OpVal) const { return true; }

		// Specialize based on number of operands.
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, Operand, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, Flags);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4, SDValue N5) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4, N5);
		}
		};

		struct
		VPMatchContext {
		SelectionDAG & DAG;
		SDNode * Root;
		SDValue RootMaskOp;
		SDValue RootVectorLenOp;

		VPMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		, Root(Root)
		, RootMaskOp()
		, RootVectorLenOp()
		{
		if (Root->isVP()) {
		int RootMaskPos = ISD::GetMaskPosVP(Root->getOpcode());
		if (RootMaskPos != -1) {
		RootMaskOp = Root->getOperand(RootMaskPos);
		}

		int RootVLenPos = ISD::GetVectorLengthPosVP(Root->getOpcode());
		if (RootVLenPos != -1) {
		RootVectorLenOp = Root->getOperand(RootVLenPos);
		}
		}
		}

		unsigned getFunctionOpCode(SDValue N) const {
		unsigned VPOpCode = N->getOpcode();
		return ISD::GetFunctionOpCodeForVP(VPOpCode, N->getFlags().hasFPExcept());
		}

		bool isCompatible(SDValue OpVal) const {
		if (!OpVal->isVP()) {
		return !Root->isVP();

		} else {
		unsigned VPOpCode = OpVal->getOpcode();
		int MaskPos = ISD::GetMaskPosVP(VPOpCode);
		if (MaskPos != -1 && RootMaskOp != OpVal.getOperand(MaskPos)) {
		return false;
		}

		int VLenPos = ISD::GetVectorLengthPosVP(VPOpCode);
		if (VLenPos != -1 && RootVectorLenOp != OpVal.getOperand(VLenPos)) {
		return false;
		}

		return true;
		}
		}

		/// whether \p OpN is a node that is functionally compatible with the NodeType \p OpNodeTy
		bool match(SDValue OpVal, unsigned OpNT) const {
		return isCompatible(OpVal) && getFunctionOpCode(OpVal) == OpNT;
		}

		// Specialize based on number of operands.
		// TODO emit VP intrinsics where MaskOp/VectorLenOp != null
		// SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 1 && VLenPos == 2);

		return DAG.getNode(VPOpcode, DL, VT, {Operand, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 2 && VLenPos == 3);

		return DAG.getNode(VPOpcode, DL, VT, {N1, N2, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 3 && VLenPos == 4);

		return DAG.getNode(VPOpcode, DL, VT, {N1, N2, N3, RootMaskOp, RootVectorLenOp}, Flags);
		}
		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering::DAGCombinerInfo implementation		// TargetLowering::DAGCombinerInfo implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {		void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {
((DAGCombiner*)DC)->AddToWorklist(N);		((DAGCombiner*)DC)->AddToWorklist(N);
▲ Show 20 Lines • Show All 808 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::AssertZext: return visitAssertExt(N);		case ISD::AssertZext: return visitAssertExt(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);		case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
		case ISD::VP_FADD: return visitFADD_VP(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
case ISD::FREM: return visitFREM(N);		case ISD::FREM: return visitFREM(N);
case ISD::FSQRT: return visitFSQRT(N);		case ISD::FSQRT: return visitFSQRT(N);
case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);		case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);
case ISD::FPOW: return visitFPOW(N);		case ISD::FPOW: return visitFPOW(N);
▲ Show 20 Lines • Show All 9,749 Lines • ▼ Show 20 Lines	ConstantFoldBITCASTofBUILD_VECTOR(SDNode *BV, EVT DstEltVT) {
return DAG.getBuildVector(VT, DL, Ops);		return DAG.getBuildVector(VT, DL, Ops);
}		}

static bool isContractable(SDNode *N) {		static bool isContractable(SDNode *N) {
SDNodeFlags F = N->getFlags();		SDNodeFlags F = N->getFlags();
return F.hasAllowContract() \|\| F.hasAllowReassociation();		return F.hasAllowContract() \|\| F.hasAllowReassociation();
}		}


/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
		template<class MatchContextClass>
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);

		MatchContextClass matcher(DAG, N);
		if (!matcher.isCompatible(N0) \|\| !matcher.isCompatible(N1)) return SDValue();

const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
bool HasFMAD = (LegalOperations && TLI.isFMADLegalForFAddFSub(DAG, N));		bool HasFMAD = (LegalOperations && TLI.isFMADLegalForFAddFSub(DAG, N));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		TLI.isFMAFasterThanFMulAndFAdd(VT) &&
Show All 16 Lines	if (STI && STI->generateFMAsInMachineCombiner(OptLevel))
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);

// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (!matcher.match(N, ISD::FMUL))
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};
// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
if (N0.getNode()->use_size() > N1.getNode()->use_size())		if (N0.getNode()->use_size() > N1.getNode()->use_size())
std::swap(N0, N1);		std::swap(N0, N1);
}		}

// fold (fadd (fmul x, y), z) -> (fma x, y, z)		// fold (fadd (fmul x, y), z) -> (fma x, y, z)
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1), N1, Flags);		N0.getOperand(0), N0.getOperand(1), N1, Flags);
}		}

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1), N0, Flags);		N1.getOperand(0), N1.getOperand(1), N0, Flags);
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if ((N0.getOpcode() == ISD::FP_EXTEND) && matcher.isCompatible(N0.getOperand(0))) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N00.getValueType())) {		N00.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		DAG.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(0)),		N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(1)), N1, Flags);		N00.getOperand(1)), N1, Flags);
}		}
}		}

// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)		// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (N1.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N1, ISD::FP_EXTEND)) {
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
if (isContractableFMUL(N10) &&		if (isContractableFMUL(N10) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N10.getValueType())) {		N10.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(0)),		N10.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(1)), N0, Flags);		N10.getOperand(1)), N0, Flags);
}		}
}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if (Aggressive) {		if (Aggressive) {
// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))		// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
if (CanFuse &&		if (CanFuse &&
N0.getOpcode() == PreferredFusedOpcode &&		matcher.match(N0, PreferredFusedOpcode) &&
N0.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N0.getOperand(2), ISD::FMUL) &&
N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {		N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
N0.getOperand(2).getOperand(1),		N0.getOperand(2).getOperand(1),
N1, Flags), Flags);		N1, Flags), Flags);
}		}

// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))		// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))
if (CanFuse &&		if (CanFuse &&
N1->getOpcode() == PreferredFusedOpcode &&		matcher.match(N1, PreferredFusedOpcode) &&
N1.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N1.getOperand(2), ISD::FMUL) &&
N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {		N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1),		N1.getOperand(0), N1.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(2).getOperand(0),		N1.getOperand(2).getOperand(0),
N1.getOperand(2).getOperand(1),		N1.getOperand(2).getOperand(1),
N0, Flags), Flags);		N0, Flags), Flags);
}		}


// fold (fadd (fma x, y, (fpext (fmul u, v))), z)		// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y, (fma (fpext u), (fpext v), z))		// -> (fma x, y, (fma (fpext u), (fpext v), z))
auto FoldFAddFMAFPExtFMul = [&] (		auto FoldFAddFMAFPExtFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, X, Y,		return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y,
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == PreferredFusedOpcode) {		if (matcher.match(N0, PreferredFusedOpcode)) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (N02.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N02, ISD::FP_EXTEND)) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (isContractableFMUL(N020) &&		if (isContractableFMUL(N020) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
N020.getValueType())) {		N020.getValueType())) {
return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),		return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),
N020.getOperand(0), N020.getOperand(1),		N020.getOperand(0), N020.getOperand(1),
N1, Flags);		N1, Flags);
}		}
}		}
}		}

// fold (fadd (fpext (fma x, y, (fmul u, v))), z)		// fold (fadd (fpext (fma x, y, (fmul u, v))), z)
// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))		// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
auto FoldFAddFPExtFMAFMul = [&] (		auto FoldFAddFPExtFMAFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, X),		matcher.getNode(ISD::FP_EXTEND, SL, VT, X),
DAG.getNode(ISD::FP_EXTEND, SL, VT, Y),		matcher.getNode(ISD::FP_EXTEND, SL, VT, Y),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (N00.getOpcode() == PreferredFusedOpcode) {		if (N00.getOpcode() == PreferredFusedOpcode) {
SDValue N002 = N00.getOperand(2);		SDValue N002 = N00.getOperand(2);
if (isContractableFMUL(N002) &&		if (isContractableFMUL(N002) &&
TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,		TLI.isFPExtFoldable(DAG, PreferredFusedOpcode, VT,
▲ Show 20 Lines • Show All 431 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {
if (SDValue FMA = FuseFSUB(N0, N1, Flags))		if (SDValue FMA = FuseFSUB(N0, N1, Flags))
return FMA;		return FMA;
if (SDValue FMA = FuseFSUB(N1, N0, Flags))		if (SDValue FMA = FuseFSUB(N1, N0, Flags))
return FMA;		return FMA;

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitFADD_VP(SDNode *N) {
		// FADD -> FMA combines:
		if (SDValue Fused = visitFADDForFMACombine<VPMatchContext>(N)) {
		AddToWorklist(Fused.getNode());
		return Fused;
		}
		return SDValue();
		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);		bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);
bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);		bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegalOrCustom(ISD::FMUL, VT) && !N0CFP && !N1CFP) {
N0.getOperand(0) == N1.getOperand(0)) {		N0.getOperand(0) == N1.getOperand(0)) {
return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),
DAG.getConstantFP(4.0, DL, VT), Flags);		DAG.getConstantFP(4.0, DL, VT), Flags);
}		}
}		}
} // enable-unsafe-fp-math		} // enable-unsafe-fp-math

// FADD -> FMA combines:		// FADD -> FMA combines:
if (SDValue Fused = visitFADDForFMACombine(N)) {		if (SDValue Fused = visitFADDForFMACombine<EmptyMatchContext>(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFSUB(SDNode *N) {		SDValue DAGCombiner::visitFSUB(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 8,803 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 986 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_UADDSUBO(SDNode *N, unsigned ResNo) {

// Use the calculated overflow everywhere.		// Use the calculated overflow everywhere.
ReplaceValueWith(SDValue(N, 1), Ofl);		ReplaceValueWith(SDValue(N, 1), Ofl);

return Res;		return Res;
}		}

// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that		// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that
// the third operand of ADDE/SUBE nodes is carry flag, which differs from		// the third operand of ADDE/SUBE nodes is carry flag, which differs from
// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.		// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.
SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {
if (ResNo == 1)		if (ResNo == 1)
return PromoteIntRes_Overflow(N);		return PromoteIntRes_Overflow(N);

// We need to sign-extend the operands so the carry value computed by the		// We need to sign-extend the operands so the carry value computed by the
// wide operation will be equivalent to the carry value computed by the		// wide operation will be equivalent to the carry value computed by the
// narrow operation.		// narrow operation.
▲ Show 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Promote integer operand: "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");
SDValue Res = SDValue();		SDValue Res = SDValue();

if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {		if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {
LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");		LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");
return false;		return false;
}		}

		if (N->isVP()) {
		Res = PromoteIntOp_VP(N, OpNo);
		} else {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
#ifndef NDEBUG		#ifndef NDEBUG
dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";		dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";
N->dump(&DAG); dbgs() << "\n";		N->dump(&DAG); dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to promote this operator's operand!");		llvm_unreachable("Do not know how to promote this operator's operand!");

▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN: Res = PromoteIntOp_VECREDUCE(N); break;		case ISD::VECREDUCE_UMIN: Res = PromoteIntOp_VECREDUCE(N); break;
}		}
		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
if (Res.getNode() == N)		if (Res.getNode() == N)
return true;		return true;
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	if (OpNo == 3) {
TruncateStore = true;		TruncateStore = true;
}		}

return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,		return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,
N->getMemoryVT(), N->getMemOperand(),		N->getMemoryVT(), N->getMemOperand(),
TruncateStore, N->isCompressingStore());		TruncateStore, N->isCompressingStore());
}		}

		SDValue DAGTypeLegalizer::PromoteIntOp_VP(SDNode *N, unsigned OpNo) {
		EVT DataVT;
		switch (N->getOpcode()) {
		default:
		DataVT = N->getValueType(0);
		break;

		case ISD::VP_STORE:
		case ISD::VP_SCATTER:
		llvm_unreachable("TODO implement VP memory nodes");
		}

		// TODO assert that \p OpNo is the mask
		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
		NewOps[OpNo] = Mask;
		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
		}

SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,		SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,
unsigned OpNo) {		unsigned OpNo) {
assert(OpNo == 2 && "Only know how to promote the mask!");		assert(OpNo == 2 && "Only know how to promote the mask!");
EVT DataVT = N->getValueType(0);		EVT DataVT = N->getValueType(0);
SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
NewOps[OpNo] = Mask;		NewOps[OpNo] = Mask;
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
▲ Show 20 Lines • Show All 2,838 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 335 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_VAARG(SDNode *N);		SDValue PromoteIntRes_VAARG(SDNode *N);
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);		SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
SDValue PromoteIntRes_MULFIX(SDNode *N);		SDValue PromoteIntRes_MULFIX(SDNode *N);
SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);		SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);
SDValue PromoteIntRes_VECREDUCE(SDNode *N);		SDValue PromoteIntRes_VECREDUCE(SDNode *N);
SDValue PromoteIntRes_ABS(SDNode *N);		SDValue PromoteIntRes_ABS(SDNode *N);


// Integer Operand Promotion.		// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);		bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);		SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);
SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);		SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);
SDValue PromoteIntOp_BITCAST(SDNode *N);		SDValue PromoteIntOp_BITCAST(SDNode *N);
SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);		SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);
SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);
Show All 19 Lines	private:
SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);		SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MULFIX(SDNode *N);		SDValue PromoteIntOp_MULFIX(SDNode *N);
SDValue PromoteIntOp_FPOWI(SDNode *N);		SDValue PromoteIntOp_FPOWI(SDNode *N);
SDValue PromoteIntOp_VECREDUCE(SDNode *N);		SDValue PromoteIntOp_VECREDUCE(SDNode *N);
		SDValue PromoteIntOp_VP(SDNode *N, unsigned OpNo);

void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);		void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Integer Expansion Support: LegalizeIntegerTypes.cpp		// Integer Expansion Support: LegalizeIntegerTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Given a processed operand Op which was expanded into two integers of half		/// Given a processed operand Op which was expanded into two integers of half
▲ Show 20 Lines • Show All 578 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 423 Lines • ▼ Show 20 Lines	if (IsInteger) {
case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE		case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE
}		}
}		}

return Result;		return Result;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// SDNode VP Support
		//===----------------------------------------------------------------------===//

		int
		ISD::GetMaskPosVP(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case ISD::VP_FNEG:
		return 1;

		case ISD::VP_ADD:
		case ISD::VP_SUB:
		case ISD::VP_MUL:
		case ISD::VP_SDIV:
		case ISD::VP_SREM:
		case ISD::VP_UDIV:
		case ISD::VP_UREM:

		case ISD::VP_AND:
		case ISD::VP_OR:
		case ISD::VP_XOR:
		case ISD::VP_SHL:
		case ISD::VP_SRA:
		case ISD::VP_SRL:

		case ISD::VP_FADD:
		case ISD::VP_FMUL:
		case ISD::VP_FSUB:
		case ISD::VP_FDIV:
		case ISD::VP_FREM:
		return 2;

		case ISD::VP_FMA:
		case ISD::VP_SELECT:
		return 3;

		case VP_REDUCE_ADD:
		case VP_REDUCE_MUL:
		case VP_REDUCE_AND:
		case VP_REDUCE_OR:
		case VP_REDUCE_XOR:
		case VP_REDUCE_SMAX:
		case VP_REDUCE_SMIN:
		case VP_REDUCE_UMAX:
		case VP_REDUCE_UMIN:
		case VP_REDUCE_FMAX:
		case VP_REDUCE_FMIN:
		return 1;

		case VP_REDUCE_FADD:
		case VP_REDUCE_FMUL:
		return 2;

		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		// (implicit) case ISD::VP_COMPOSE: return -1
		}
		}

		int
		ISD::GetVectorLengthPosVP(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case VP_SELECT:
		return 0;

		case VP_FNEG:
		return 2;

		case VP_ADD:
		case VP_SUB:
		case VP_MUL:
		case VP_SDIV:
		case VP_SREM:
		case VP_UDIV:
		case VP_UREM:

		case VP_AND:
		case VP_OR:
		case VP_XOR:
		case VP_SHL:
		case VP_SRA:
		case VP_SRL:

		case VP_FADD:
		case VP_FMUL:
		case VP_FDIV:
		case VP_FREM:
		return 3;

		case VP_FMA:
		return 4;

		case VP_COMPOSE:
		return 3;

		case VP_REDUCE_ADD:
		case VP_REDUCE_MUL:
		case VP_REDUCE_AND:
		case VP_REDUCE_OR:
		case VP_REDUCE_XOR:
		case VP_REDUCE_SMAX:
		case VP_REDUCE_SMIN:
		case VP_REDUCE_UMAX:
		case VP_REDUCE_UMIN:
		case VP_REDUCE_FMAX:
		case VP_REDUCE_FMIN:
		return 2;

		case VP_REDUCE_FADD:
		case VP_REDUCE_FMUL:
		return 3;

		}
		}

		unsigned
		ISD::GetFunctionOpCodeForVP(unsigned OpCode, bool hasFPExcept) {
		switch (OpCode) {
		default: return OpCode;

		case VP_SELECT: return ISD::VSELECT;
		case VP_ADD: return ISD::ADD;
		case VP_SUB: return ISD::SUB;
		case VP_MUL: return ISD::MUL;
		case VP_SDIV: return ISD::SDIV;
		case VP_SREM: return ISD::SREM;
		case VP_UDIV: return ISD::UDIV;
		case VP_UREM: return ISD::UREM;

		case VP_AND: return ISD::AND;
		case VP_OR: return ISD::OR;
		case VP_XOR: return ISD::XOR;
		case VP_SHL: return ISD::SHL;
		case VP_SRA: return ISD::SRA;
		case VP_SRL: return ISD::SRL;

		case VP_FNEG: return ISD::FNEG;
		case VP_FADD: return hasFPExcept ? ISD::STRICT_FADD : ISD::FADD;
		case VP_FSUB: return hasFPExcept ? ISD::STRICT_FSUB : ISD::FSUB;
		case VP_FMUL: return hasFPExcept ? ISD::STRICT_FMUL : ISD::FMUL;
		case VP_FDIV: return hasFPExcept ? ISD::STRICT_FDIV : ISD::FDIV;
		case VP_FREM: return hasFPExcept ? ISD::STRICT_FREM : ISD::FREM;

		case VP_REDUCE_AND: return VECREDUCE_AND;
		case VP_REDUCE_OR: return VECREDUCE_OR;
		case VP_REDUCE_XOR: return VECREDUCE_XOR;
		case VP_REDUCE_ADD: return VECREDUCE_ADD;
		case VP_REDUCE_FADD: return VECREDUCE_FADD;
		case VP_REDUCE_FMUL: return VECREDUCE_FMUL;
		case VP_REDUCE_FMAX: return VECREDUCE_FMAX;
		case VP_REDUCE_FMIN: return VECREDUCE_FMIN;
		case VP_REDUCE_UMAX: return VECREDUCE_UMAX;
		case VP_REDUCE_UMIN: return VECREDUCE_UMIN;
		case VP_REDUCE_SMAX: return VECREDUCE_SMAX;
		case VP_REDUCE_SMIN: return VECREDUCE_SMIN;

		case VP_STORE: return ISD::MSTORE;
		case VP_LOAD: return ISD::MLOAD;
		case VP_GATHER: return ISD::MGATHER;
		case VP_SCATTER: return ISD::MSCATTER;

		case VP_FMA: return hasFPExcept ? ISD::STRICT_FMA : ISD::FMA;
		}
		}

		unsigned
		ISD::GetVPForFunctionOpCode(unsigned OpCode) {
		switch (OpCode) {
		default: llvm_unreachable("can not translate this Opcode to VP");

		case VSELECT: return ISD::VP_SELECT;
		case ADD: return ISD::VP_ADD;
		case SUB: return ISD::VP_SUB;
		case MUL: return ISD::VP_MUL;
		case SDIV: return ISD::VP_SDIV;
		case SREM: return ISD::VP_SREM;
		case UDIV: return ISD::VP_UDIV;
		case UREM: return ISD::VP_UREM;

		case AND: return ISD::VP_AND;
		case OR: return ISD::VP_OR;
		case XOR: return ISD::VP_XOR;
		case SHL: return ISD::VP_SHL;
		case SRA: return ISD::VP_SRA;
		case SRL: return ISD::VP_SRL;

		case FNEG: return ISD::VP_FNEG;
		case STRICT_FADD:
		case FADD: return ISD::VP_FADD;
		case STRICT_FSUB:
		case FSUB: return ISD::VP_FSUB;
		case STRICT_FMUL:
		case FMUL: return ISD::VP_FMUL;
		case STRICT_FDIV:
		case FDIV: return ISD::VP_FDIV;
		case STRICT_FREM:
		case FREM: return ISD::VP_FREM;

		case STRICT_FMA:
		case FMA: return ISD::VP_FMA;
		}
		}


		//===----------------------------------------------------------------------===//
// SDNode Profile Support		// SDNode Profile Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// AddNodeIDOpcode - Add the node opcode to the NodeID data.		/// AddNodeIDOpcode - Add the node opcode to the NodeID data.
static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {		static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {
ID.AddInteger(OpC);		ID.AddInteger(OpC);
}		}

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
}		}
case ISD::STORE: {		case ISD::STORE: {
const StoreSDNode *ST = cast<StoreSDNode>(N);		const StoreSDNode *ST = cast<StoreSDNode>(N);
ID.AddInteger(ST->getMemoryVT().getRawBits());		ID.AddInteger(ST->getMemoryVT().getRawBits());
ID.AddInteger(ST->getRawSubclassData());		ID.AddInteger(ST->getRawSubclassData());
ID.AddInteger(ST->getPointerInfo().getAddrSpace());		ID.AddInteger(ST->getPointerInfo().getAddrSpace());
break;		break;
}		}
		case ISD::VP_LOAD: {
		const VPLoadSDNode *ELD = cast<VPLoadSDNode>(N);
		ID.AddInteger(ELD->getMemoryVT().getRawBits());
		ID.AddInteger(ELD->getRawSubclassData());
		ID.AddInteger(ELD->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_STORE: {
		const VPStoreSDNode *EST = cast<VPStoreSDNode>(N);
		ID.AddInteger(EST->getMemoryVT().getRawBits());
		ID.AddInteger(EST->getRawSubclassData());
		ID.AddInteger(EST->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_GATHER: {
		const VPGatherSDNode *EG = cast<VPGatherSDNode>(N);
		ID.AddInteger(EG->getMemoryVT().getRawBits());
		ID.AddInteger(EG->getRawSubclassData());
		ID.AddInteger(EG->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_SCATTER: {
		const VPScatterSDNode *ES = cast<VPScatterSDNode>(N);
		ID.AddInteger(ES->getMemoryVT().getRawBits());
		ID.AddInteger(ES->getRawSubclassData());
		ID.AddInteger(ES->getPointerInfo().getAddrSpace());
		break;
		}
case ISD::MLOAD: {		case ISD::MLOAD: {
const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);		const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);
ID.AddInteger(MLD->getMemoryVT().getRawBits());		ID.AddInteger(MLD->getMemoryVT().getRawBits());
ID.AddInteger(MLD->getRawSubclassData());		ID.AddInteger(MLD->getRawSubclassData());
ID.AddInteger(MLD->getPointerInfo().getAddrSpace());		ID.AddInteger(MLD->getPointerInfo().getAddrSpace());
break;		break;
}		}
case ISD::MSTORE: {		case ISD::MSTORE: {
▲ Show 20 Lines • Show All 6,415 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,

CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
SDValue V(N, 0);		SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);		NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;		return V;
}		}

		SDValue SelectionDAG::getLoadVP(EVT VT, const SDLoc &dl, SDValue Chain,
		SDValue Ptr, SDValue Mask, SDValue VLen,
		EVT MemVT, MachineMemOperand *MMO,
		ISD::LoadExtType ExtTy) {
		SDVTList VTs = getVTList(VT, MVT::Other);
		SDValue Ops[] = { Chain, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_LOAD, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<VPLoadSDNode>(
		dl.getIROrder(), VTs, ExtTy, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPLoadSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPLoadSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		ExtTy, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}


		SDValue SelectionDAG::getStoreVP(SDValue Chain, const SDLoc &dl,
		SDValue Val, SDValue Ptr, SDValue Mask,
		SDValue VLen, EVT MemVT, MachineMemOperand *MMO,
		bool IsTruncating) {
		assert(Chain.getValueType() == MVT::Other &&
		"Invalid chain type");
		SDVTList VTs = getVTList(MVT::Other);
		SDValue Ops[] = { Chain, Val, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::MSTORE, VTs, Ops);
		ID.AddInteger(MemVT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<VPStoreSDNode>(
		dl.getIROrder(), VTs, IsTruncating, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPStoreSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPStoreSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		IsTruncating, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getGatherVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO,
		ISD::MemIndexType IndexType) {
		assert(Ops.size() == 6 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_GATHER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedGatherSDNode>(
		dl.getIROrder(), VTs, VT, MMO, IndexType));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPGatherSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}

		auto *N = newSDNode<VPGatherSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO, IndexType);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getScatterVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO,
		ISD::MemIndexType IndexType) {
		assert(Ops.size() == 7 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_SCATTER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedScatterSDNode>(
		dl.getIROrder(), VTs, VT, MMO, IndexType));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPScatterSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPScatterSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO, IndexType);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,		SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,
SDValue Val, SDValue Ptr, SDValue Mask,		SDValue Val, SDValue Ptr, SDValue Mask,
EVT MemVT, MachineMemOperand *MMO,		EVT MemVT, MachineMemOperand *MMO,
bool IsTruncating, bool IsCompressing) {		bool IsTruncating, bool IsCompressing) {
assert(Chain.getValueType() == MVT::Other &&		assert(Chain.getValueType() == MVT::Other &&
"Invalid chain type");		"Invalid chain type");
SDVTList VTs = getVTList(MVT::Other);		SDVTList VTs = getVTList(MVT::Other);
SDValue Ops[] = { Chain, Val, Ptr, Mask };		SDValue Ops[] = { Chain, Val, Ptr, Mask };
▲ Show 20 Lines • Show All 2,643 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 742 Lines • ▼ Show 20 Lines	private:
void visitAtomicStore(const StoreInst &I);		void visitAtomicStore(const StoreInst &I);
void visitLoadFromSwiftError(const LoadInst &I);		void visitLoadFromSwiftError(const LoadInst &I);
void visitStoreToSwiftError(const StoreInst &I);		void visitStoreToSwiftError(const StoreInst &I);

void visitInlineAsm(ImmutableCallSite CS);		void visitInlineAsm(ImmutableCallSite CS);
void visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);		void visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);		void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);		void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
		void visitVectorPredicationIntrinsic(const VPIntrinsic &VPI);
		void visitCmpVP(const VPIntrinsic &I);
		void visitLoadVP(const CallInst &I);
		void visitStoreVP(const CallInst &I);
		void visitGatherVP(const CallInst &I);
		void visitScatterVP(const CallInst &I);

void visitVAStart(const CallInst &I);		void visitVAStart(const CallInst &I);
void visitVAArg(const VAArgInst &I);		void visitVAArg(const VAArgInst &I);
void visitVAEnd(const CallInst &I);		void visitVAEnd(const CallInst &I);
void visitVACopy(const CallInst &I);		void visitVACopy(const CallInst &I);
void visitStackmap(const CallInst &I);		void visitStackmap(const CallInst &I);
void visitPatchpoint(ImmutableCallSite CS,		void visitPatchpoint(ImmutableCallSite CS,
const BasicBlock *EHPadBB = nullptr);		const BasicBlock *EHPadBB = nullptr);
▲ Show 20 Lines • Show All 135 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,307 Lines • ▼ Show 20 Lines	getMachineMemOperand(MachinePointerInfo(PtrOperand),
Alignment, AAInfo);		Alignment, AAInfo);
SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,		SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,
MMO, false /* Truncating */,		MMO, false /* Truncating */,
IsCompressing);		IsCompressing);
DAG.setRoot(StoreNode);		DAG.setRoot(StoreNode);
setValue(&I, StoreNode);		setValue(&I, StoreNode);
}		}

		void SelectionDAGBuilder::visitStoreVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getVPStoreOps = [&](Value* &Ptr, Value* &Mask, Value* &Src0,
		Value * &VLen, unsigned & Alignment) {
		// llvm.masked.store.*(Src0, Ptr, Mask, VLen)
		Src0 = I.getArgOperand(0);
		Ptr = I.getArgOperand(1);
		Alignment = I.getParamAlignment(1);
		Mask = I.getArgOperand(2);
		VLen = I.getArgOperand(3);
		};

		Value PtrOperand, MaskOperand, Src0Operand, VLenOperand;
		unsigned Alignment = 0;
		getVPStoreOps(PtrOperand, MaskOperand, Src0Operand, VLenOperand, Alignment);

		SDValue Ptr = getValue(PtrOperand);
		SDValue Src0 = getValue(Src0Operand);
		SDValue Mask = getValue(MaskOperand);
		SDValue VLen = getValue(VLenOperand);

		EVT VT = Src0.getValueType();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		SDValue StoreNode = DAG.getStoreVP(getRoot(), sdl, Src0, Ptr, Mask, VLen, VT,
		MMO, false /* Truncating */);
		DAG.setRoot(StoreNode);
		setValue(&I, StoreNode);
		}

// Get a uniform base for the Gather/Scatter intrinsic.		// Get a uniform base for the Gather/Scatter intrinsic.
// The first argument of the Gather/Scatter intrinsic is a vector of pointers.		// The first argument of the Gather/Scatter intrinsic is a vector of pointers.
// We try to represent it as a base pointer + vector of indices.		// We try to represent it as a base pointer + vector of indices.
// Usually, the vector of pointers comes from a 'getelementptr' instruction.		// Usually, the vector of pointers comes from a 'getelementptr' instruction.
// The first operand of the GEP may be a single pointer or a vector of pointers		// The first operand of the GEP may be a single pointer or a vector of pointers
// Example:		// Example:
// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind		// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind
// or		// or
▲ Show 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	SDValue Gather = DAG.getMaskedGather(DAG.getVTList(VT, MVT::Other), VT, sdl,
Ops, MMO, IndexType);		Ops, MMO, IndexType);

SDValue OutChain = Gather.getValue(1);		SDValue OutChain = Gather.getValue(1);
if (!ConstantMemory)		if (!ConstantMemory)
PendingLoads.push_back(OutChain);		PendingLoads.push_back(OutChain);
setValue(&I, Gather);		setValue(&I, Gather);
}		}

		void SelectionDAGBuilder::visitGatherVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// @llvm.evl.gather.*(Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(0);
		SDValue Mask = getValue(I.getArgOperand(1));
		SDValue VLen = getValue(I.getArgOperand(2));

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		unsigned Alignment = I.getParamAlignment(0);
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		SDValue Root = DAG.getRoot();
		SDValue Base;
		SDValue Index;
		ISD::MemIndexType IndexType;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, IndexType, Scale, this);
		bool ConstantMemory = false;
		if (UniformBase && AA &&
		AA->pointsToConstantMemory(
		MemoryLocation(BasePtr,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo))) {
		// Do not serialize (non-volatile) loads of constant memory with anything.
		Root = DAG.getEntryNode();
		ConstantMemory = true;
		}

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(UniformBase ? BasePtr : nullptr),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { Root, Base, Index, Scale, Mask, VLen };
		SDValue Gather = DAG.getGatherVP(DAG.getVTList(VT, MVT::Other), VT, sdl, Ops, MMO, IndexType);

		SDValue OutChain = Gather.getValue(1);
		if (!ConstantMemory)
		PendingLoads.push_back(OutChain);
		setValue(&I, Gather);
		}

		void SelectionDAGBuilder::visitScatterVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// llvm.evl.scatter.*(Src0, Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(1);
		SDValue Src0 = getValue(I.getArgOperand(0));
		SDValue Mask = getValue(I.getArgOperand(2));
		SDValue VLen = getValue(I.getArgOperand(3));
		EVT VT = Src0.getValueType();
		unsigned Alignment = I.getParamAlignment(1);
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		SDValue Base;
		SDValue Index;
		ISD::MemIndexType IndexType;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, IndexType, Scale, this);

		const Value *MemOpBasePtr = UniformBase ? BasePtr : nullptr;
		MachineMemOperand *MMO = DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(MemOpBasePtr),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { getRoot(), Src0, Base, Index, Scale, Mask, VLen };
		SDValue Scatter = DAG.getScatterVP(DAG.getVTList(MVT::Other), VT, sdl,
		Ops, MMO, IndexType);
		DAG.setRoot(Scatter);
		setValue(&I, Scatter);
		}

		void SelectionDAGBuilder::visitLoadVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getMaskedLoadOps = [&](Value* &Ptr, Value* &Mask, Value* &VLen,
		unsigned& Alignment) {
		// @llvm.evl.load.*(Ptr, Mask, Vlen)
		Ptr = I.getArgOperand(0);
		Alignment = I.getParamAlignment(0);
		Mask = I.getArgOperand(1);
		VLen = I.getArgOperand(2);
		};

		Value PtrOperand, MaskOperand, *VLenOperand;
		unsigned Alignment;
		getMaskedLoadOps(PtrOperand, MaskOperand, VLenOperand, Alignment);

		SDValue Ptr = getValue(PtrOperand);
		SDValue VLen = getValue(VLenOperand);
		SDValue Mask = getValue(MaskOperand);

		// infer the return type
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SmallVector<EVT, 4> ValValueVTs;
		ComputeValueVTs(TLI, DAG.getDataLayout(), I.getType(), ValValueVTs);
		EVT VT = ValValueVTs[0];
		assert((ValValueVTs.size() == 1) && "splitting not implemented");

		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		// Do not serialize masked loads of constant memory with anything.
		bool AddToChain =
		!AA \|\| !AA->pointsToConstantMemory(MemoryLocation(
		PtrOperand,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo));
		SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		SDValue Load = DAG.getLoadVP(VT, sdl, InChain, Ptr, Mask, VLen, VT, MMO,
		ISD::NON_EXTLOAD);
		if (AddToChain)
		PendingLoads.push_back(Load.getValue(1));
		setValue(&I, Load);
		}

void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {		void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
AtomicOrdering SuccessOrdering = I.getSuccessOrdering();		AtomicOrdering SuccessOrdering = I.getSuccessOrdering();
AtomicOrdering FailureOrdering = I.getFailureOrdering();		AtomicOrdering FailureOrdering = I.getFailureOrdering();
SyncScope::ID SSID = I.getSyncScopeID();		SyncScope::ID SSID = I.getSyncScopeID();

SDValue InChain = getRoot();		SDValue InChain = getRoot();

▲ Show 20 Lines • Show All 1,591 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I,
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_lround:		case Intrinsic::experimental_constrained_lround:
case Intrinsic::experimental_constrained_llround:		case Intrinsic::experimental_constrained_llround:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
return;		return;

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		case Intrinsic::vp_fcmp:
		case Intrinsic::vp_icmp:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmax:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmul:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_smin:
		visitVectorPredicationIntrinsic(cast<VPIntrinsic>(I));
		return;

case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&		if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
TLI.isFMAFasterThanFMulAndFAdd(VT)) {		TLI.isFMAFasterThanFMulAndFAdd(VT)) {
setValue(&I, DAG.getNode(ISD::FMA, sdl,		setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
▲ Show 20 Lines • Show All 850 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
Opcode = ISD::STRICT_FTRUNC;		Opcode = ISD::STRICT_FTRUNC;
break;		break;
}		}

SDVTList VTs = DAG.getVTList(ValueVTs);		SDVTList VTs = DAG.getVTList(ValueVTs);
SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers);		SDValue Result = DAG.getNode(Opcode, sdl, VTs, Opers);

if (FPI.getExceptionBehavior() !=		if (FPI.getExceptionBehavior() != ExceptionBehavior::ebIgnore) {
ConstrainedFPIntrinsic::ExceptionBehavior::ebIgnore) {
SDNodeFlags Flags;		SDNodeFlags Flags;
Flags.setFPExcept(true);		Flags.setFPExcept(true);
Result->setFlags(Flags);		Result->setFlags(Flags);
}		}

assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

		void SelectionDAGBuilder::visitCmpVP(const VPIntrinsic &I) {
		ISD::CondCode Condition;
		CmpInst::Predicate predicate = I.getCmpPredicate();
		bool IsFP = I.getOperand(0)->getType()->isFPOrFPVectorTy();
		if (IsFP) {
		Condition = getFCmpCondCode(predicate);
		auto *FPMO = dyn_cast<FPMathOperator>(&I);
		if ((FPMO && FPMO->hasNoNaNs()) \|\| TM.Options.NoNaNsFPMath)
		Condition = getFCmpCodeWithoutNaN(Condition);

		} else {
		Condition = getICmpCondCode(predicate);
		}

		SDValue Op1 = getValue(I.getOperand(0));
		SDValue Op2 = getValue(I.getOperand(1));

		EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
		I.getType());
		setValue(&I, DAG.getSetCC(getCurSDLoc(), DestVT, Op1, Op2, Condition));
		}

		void SelectionDAGBuilder::visitVectorPredicationIntrinsic(
		const VPIntrinsic &VPIntrin) {
		SDLoc sdl = getCurSDLoc();
		unsigned Opcode;
		switch (VPIntrin.getIntrinsicID()) {
		default:
		llvm_unreachable("Unforeseen intrinsic"); // Can't reach here.

		case Intrinsic::vp_load:
		visitLoadVP(VPIntrin);
		return;
		case Intrinsic::vp_store:
		visitStoreVP(VPIntrin);
		return;
		case Intrinsic::vp_gather:
		visitGatherVP(VPIntrin);
		return;
		case Intrinsic::vp_scatter:
		visitScatterVP(VPIntrin);
		return;

		case Intrinsic::vp_fcmp:
		case Intrinsic::vp_icmp:
		visitCmpVP(VPIntrin);
		return;

		case Intrinsic::vp_add:
		Opcode = ISD::VP_ADD;
		break;
		case Intrinsic::vp_sub:
		Opcode = ISD::VP_SUB;
		break;
		case Intrinsic::vp_mul:
		Opcode = ISD::VP_MUL;
		break;
		case Intrinsic::vp_udiv:
		Opcode = ISD::VP_UDIV;
		break;
		case Intrinsic::vp_sdiv:
		Opcode = ISD::VP_SDIV;
		break;
		case Intrinsic::vp_urem:
		Opcode = ISD::VP_UREM;
		break;
		case Intrinsic::vp_srem:
		Opcode = ISD::VP_SREM;
		break;

		case Intrinsic::vp_and:
		Opcode = ISD::VP_AND;
		break;
		case Intrinsic::vp_or:
		Opcode = ISD::VP_OR;
		break;
		case Intrinsic::vp_xor:
		Opcode = ISD::VP_XOR;
		break;
		case Intrinsic::vp_ashr:
		Opcode = ISD::VP_SRA;
		break;
		case Intrinsic::vp_lshr:
		Opcode = ISD::VP_SRL;
		break;
		case Intrinsic::vp_shl:
		Opcode = ISD::VP_SHL;
		break;

		case Intrinsic::vp_fneg:
		Opcode = ISD::VP_FNEG;
		break;
		case Intrinsic::vp_fadd:
		Opcode = ISD::VP_FADD;
		break;
		case Intrinsic::vp_fsub:
		Opcode = ISD::VP_FSUB;
		break;
		case Intrinsic::vp_fmul:
		Opcode = ISD::VP_FMUL;
		break;
		case Intrinsic::vp_fdiv:
		Opcode = ISD::VP_FDIV;
		break;
		case Intrinsic::vp_frem:
		Opcode = ISD::VP_FREM;
		break;

		case Intrinsic::vp_fma:
		Opcode = ISD::VP_FMA;
		break;

		case Intrinsic::vp_select:
		Opcode = ISD::VP_SELECT;
		break;
		case Intrinsic::vp_compose:
		Opcode = ISD::VP_COMPOSE;
		break;
		case Intrinsic::vp_compress:
		Opcode = ISD::VP_COMPRESS;
		break;
		case Intrinsic::vp_expand:
		Opcode = ISD::VP_EXPAND;
		break;
		case Intrinsic::vp_vshift:
		Opcode = ISD::VP_VSHIFT;
		break;

		case Intrinsic::vp_reduce_and:
		Opcode = ISD::VP_REDUCE_AND;
		break;
		case Intrinsic::vp_reduce_or:
		Opcode = ISD::VP_REDUCE_OR;
		break;
		case Intrinsic::vp_reduce_xor:
		Opcode = ISD::VP_REDUCE_XOR;
		break;
		case Intrinsic::vp_reduce_add:
		Opcode = ISD::VP_REDUCE_ADD;
		break;
		case Intrinsic::vp_reduce_mul:
		Opcode = ISD::VP_REDUCE_MUL;
		break;
		case Intrinsic::vp_reduce_fadd:
		Opcode = ISD::VP_REDUCE_FADD;
		break;
		case Intrinsic::vp_reduce_fmul:
		Opcode = ISD::VP_REDUCE_FMUL;
		break;
		case Intrinsic::vp_reduce_smax:
		Opcode = ISD::VP_REDUCE_SMAX;
		break;
		case Intrinsic::vp_reduce_smin:
		Opcode = ISD::VP_REDUCE_SMIN;
		break;
		case Intrinsic::vp_reduce_umax:
		Opcode = ISD::VP_REDUCE_UMAX;
		break;
		case Intrinsic::vp_reduce_umin:
		Opcode = ISD::VP_REDUCE_UMIN;
		break;
		}

		// TODO memory evl: SDValue Chain = getRoot();

		SmallVector<EVT, 4> ValueVTs;
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		ComputeValueVTs(TLI, DAG.getDataLayout(), VPIntrin.getType(), ValueVTs);
		SDVTList VTs = DAG.getVTList(ValueVTs);

		// ValueVTs.push_back(MVT::Other); // Out chain

		SDValue Result;

		switch (VPIntrin.getNumArgOperands()) {
		default:
		llvm_unreachable("unexpected number of arguments to evl intrinsic");
		case 3:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{getValue(VPIntrin.getArgOperand(0)),
		getValue(VPIntrin.getArgOperand(1)),
		getValue(VPIntrin.getArgOperand(2))});
		break;

		case 4:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{getValue(VPIntrin.getArgOperand(0)),
		getValue(VPIntrin.getArgOperand(1)),
		getValue(VPIntrin.getArgOperand(2)),
		getValue(VPIntrin.getArgOperand(3))});
		break;

		case 5:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{getValue(VPIntrin.getArgOperand(0)),
		getValue(VPIntrin.getArgOperand(1)),
		getValue(VPIntrin.getArgOperand(2)),
		getValue(VPIntrin.getArgOperand(3)),
		getValue(VPIntrin.getArgOperand(4))});
		break;
		}

		if (Result.getNode()->getNumValues() == 2) {
		// this VP node has a chain
		SDValue OutChain = Result.getValue(1);
		DAG.setRoot(OutChain);
		SDValue VPResult = Result.getValue(0);
		setValue(&VPIntrin, VPResult);
		} else {
		// this is a pure node
		setValue(&VPIntrin, Result);
		}
		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,		SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
const BasicBlock *EHPadBB) {		const BasicBlock *EHPadBB) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
MCSymbol *BeginLabel = nullptr;		MCSymbol *BeginLabel = nullptr;

if (EHPadBB) {		if (EHPadBB) {
▲ Show 20 Lines • Show All 3,555 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 433 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_OR: return "vecreduce_or";		case ISD::VECREDUCE_OR: return "vecreduce_or";
case ISD::VECREDUCE_XOR: return "vecreduce_xor";		case ISD::VECREDUCE_XOR: return "vecreduce_xor";
case ISD::VECREDUCE_SMAX: return "vecreduce_smax";		case ISD::VECREDUCE_SMAX: return "vecreduce_smax";
case ISD::VECREDUCE_SMIN: return "vecreduce_smin";		case ISD::VECREDUCE_SMIN: return "vecreduce_smin";
case ISD::VECREDUCE_UMAX: return "vecreduce_umax";		case ISD::VECREDUCE_UMAX: return "vecreduce_umax";
case ISD::VECREDUCE_UMIN: return "vecreduce_umin";		case ISD::VECREDUCE_UMIN: return "vecreduce_umin";
case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";		case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";
case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";		case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";

		// Explicit Vector Length erxtension
		// VP Memory
		case ISD::VP_LOAD: return "vp_load";
		case ISD::VP_STORE: return "vp_store";
		case ISD::VP_GATHER: return "vp_gather";
		case ISD::VP_SCATTER: return "vp_scatter";

		// VP Unary operators
		case ISD::VP_FNEG: return "vp_fneg";

		// VP Binary operators
		case ISD::VP_ADD: return "vp_add";
		case ISD::VP_SUB: return "vp_sub";
		case ISD::VP_MUL: return "vp_mul";
		case ISD::VP_SDIV: return "vp_sdiv";
		case ISD::VP_UDIV: return "vp_udiv";
		case ISD::VP_SREM: return "vp_srem";
		case ISD::VP_UREM: return "vp_urem";
		case ISD::VP_AND: return "vp_and";
		case ISD::VP_OR: return "vp_or";
		case ISD::VP_XOR: return "vp_xor";
		case ISD::VP_SHL: return "vp_shl";
		case ISD::VP_SRA: return "vp_sra";
		case ISD::VP_SRL: return "vp_srl";
		case ISD::VP_FADD: return "vp_fadd";
		case ISD::VP_FSUB: return "vp_fsub";
		case ISD::VP_FMUL: return "vp_fmul";
		case ISD::VP_FDIV: return "vp_fdiv";
		case ISD::VP_FREM: return "vp_frem";

		// VP comparison
		case ISD::VP_SETCC: return "vp_setcc";

		// VP ternary operators
		case ISD::VP_FMA: return "vp_fma";

		// VP shuffle
		case ISD::VP_VSHIFT: return "vp_vshift";
		case ISD::VP_COMPRESS: return "vp_compress";
		case ISD::VP_EXPAND: return "vp_expand";

		case ISD::VP_COMPOSE: return "vp_compose";
		case ISD::VP_SELECT: return "vp_select";

		// VP reduction operators
		case ISD::VP_REDUCE_FADD: return "vp_reduce_fadd";
		case ISD::VP_REDUCE_FMUL: return "vp_reduce_fmul";
		case ISD::VP_REDUCE_ADD: return "vp_reduce_add";
		case ISD::VP_REDUCE_MUL: return "vp_reduce_mul";
		case ISD::VP_REDUCE_AND: return "vp_reduce_and";
		case ISD::VP_REDUCE_OR: return "vp_reduce_or";
		case ISD::VP_REDUCE_XOR: return "vp_reduce_xor";
		case ISD::VP_REDUCE_SMAX: return "vp_reduce_smax";
		case ISD::VP_REDUCE_SMIN: return "vp_reduce_smin";
		case ISD::VP_REDUCE_UMAX: return "vp_reduce_umax";
		case ISD::VP_REDUCE_UMIN: return "vp_reduce_umin";
		case ISD::VP_REDUCE_FMAX: return "vp_reduce_fmax";
		case ISD::VP_REDUCE_FMIN: return "vp_reduce_fmin";
}		}
}		}

const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {		const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {
switch (AM) {		switch (AM) {
default: return "";		default: return "";
case ISD::PRE_INC: return "<pre-inc>";		case ISD::PRE_INC: return "<pre-inc>";
case ISD::PRE_DEC: return "<pre-dec>";		case ISD::PRE_DEC: return "<pre-dec>";
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 810 Lines • ▼ Show 20 Lines	#endif

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

		if (getenv("SDEBUG")) {
		CurDAG->dump();
		}

#ifndef NDEBUG		#ifndef NDEBUG
if (TTI.hasBranchDivergence())		if (TTI.hasBranchDivergence())
CurDAG->VerifyDAGDiverence();		CurDAG->VerifyDAGDiverence();
#endif		#endif

LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "		LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
▲ Show 20 Lines • Show All 2,839 Lines • Show Last 20 Lines

llvm/lib/CodeGen/TargetPassConfig.cpp

Show First 20 Lines • Show All 666 Lines • ▼ Show 20 Lines	if (getOptLevel() != CodeGenOpt::None && !DisableConstantHoisting)
addPass(createConstantHoistingPass());		addPass(createConstantHoistingPass());

if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)		if (getOptLevel() != CodeGenOpt::None && !DisablePartialLibcallInlining)
addPass(createPartiallyInlineLibCallsPass());		addPass(createPartiallyInlineLibCallsPass());

// Instrument function entry and exit, e.g. with calls to mcount().		// Instrument function entry and exit, e.g. with calls to mcount().
addPass(createPostInlineEntryExitInstrumenterPass());		addPass(createPostInlineEntryExitInstrumenterPass());

		// Expand vector predication intrinsics into standard IR instructions.
		// This pass has to run before ScalarizeMaskedMemIntrin and ExpandReduction
		// passes since it emits those kinds of intrinsics.
		addPass(createExpandVectorPredicationPass());

// Add scalarization of target's unsupported masked memory intrinsics pass.		// Add scalarization of target's unsupported masked memory intrinsics pass.
// the unsupported intrinsic will be replaced with a chain of basic blocks,		// the unsupported intrinsic will be replaced with a chain of basic blocks,
// that stores/loads element one-by-one if the appropriate mask bit is set.		// that stores/loads element one-by-one if the appropriate mask bit is set.
addPass(createScalarizeMaskedMemIntrinPass());		addPass(createScalarizeMaskedMemIntrinPass());

// Expand reduction intrinsics into shuffle sequences if the target wants to.		// Expand reduction intrinsics into shuffle sequences if the target wants to.
addPass(createExpandReductionsPass());		addPass(createExpandReductionsPass());
}		}
▲ Show 20 Lines • Show All 557 Lines • Show Last 20 Lines

llvm/lib/IR/Attributes.cpp

Show First 20 Lines • Show All 284 Lines • ▼ Show 20 Lines	std::string Attribute::getAsString(bool InAttrGrp) const {
if (hasAttribute(Attribute::AlwaysInline))		if (hasAttribute(Attribute::AlwaysInline))
return "alwaysinline";		return "alwaysinline";
if (hasAttribute(Attribute::ArgMemOnly))		if (hasAttribute(Attribute::ArgMemOnly))
return "argmemonly";		return "argmemonly";
if (hasAttribute(Attribute::Builtin))		if (hasAttribute(Attribute::Builtin))
return "builtin";		return "builtin";
if (hasAttribute(Attribute::Convergent))		if (hasAttribute(Attribute::Convergent))
return "convergent";		return "convergent";
		if (hasAttribute(Attribute::VectorLength))
		return "vlen";
if (hasAttribute(Attribute::SwiftError))		if (hasAttribute(Attribute::SwiftError))
return "swifterror";		return "swifterror";
if (hasAttribute(Attribute::SwiftSelf))		if (hasAttribute(Attribute::SwiftSelf))
return "swiftself";		return "swiftself";
if (hasAttribute(Attribute::InaccessibleMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOnly))
return "inaccessiblememonly";		return "inaccessiblememonly";
if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))
return "inaccessiblemem_or_argmemonly";		return "inaccessiblemem_or_argmemonly";
if (hasAttribute(Attribute::InAlloca))		if (hasAttribute(Attribute::InAlloca))
return "inalloca";		return "inalloca";
if (hasAttribute(Attribute::InlineHint))		if (hasAttribute(Attribute::InlineHint))
return "inlinehint";		return "inlinehint";
if (hasAttribute(Attribute::InReg))		if (hasAttribute(Attribute::InReg))
return "inreg";		return "inreg";
if (hasAttribute(Attribute::JumpTable))		if (hasAttribute(Attribute::JumpTable))
return "jumptable";		return "jumptable";
		if (hasAttribute(Attribute::Mask))
		return "mask";
		if (hasAttribute(Attribute::Passthru))
		return "passthru";
if (hasAttribute(Attribute::MinSize))		if (hasAttribute(Attribute::MinSize))
return "minsize";		return "minsize";
if (hasAttribute(Attribute::Naked))		if (hasAttribute(Attribute::Naked))
return "naked";		return "naked";
if (hasAttribute(Attribute::Nest))		if (hasAttribute(Attribute::Nest))
return "nest";		return "nest";
if (hasAttribute(Attribute::NoAlias))		if (hasAttribute(Attribute::NoAlias))
return "noalias";		return "noalias";
▲ Show 20 Lines • Show All 1,544 Lines • Show Last 20 Lines

llvm/lib/IR/CMakeLists.txt

Show All 40 Lines	add_llvm_library(LLVMCore
ModuleSummaryIndex.cpp		ModuleSummaryIndex.cpp
Operator.cpp		Operator.cpp
OptBisect.cpp		OptBisect.cpp
Pass.cpp		Pass.cpp
PassInstrumentation.cpp		PassInstrumentation.cpp
PassManager.cpp		PassManager.cpp
PassRegistry.cpp		PassRegistry.cpp
PassTimingInfo.cpp		PassTimingInfo.cpp
		PredicatedInst.cpp
		ProfileSummary.cpp
RemarkStreamer.cpp		RemarkStreamer.cpp
SafepointIRVerifier.cpp		SafepointIRVerifier.cpp
ProfileSummary.cpp
Statepoint.cpp		Statepoint.cpp
Type.cpp		Type.cpp
TypeFinder.cpp		TypeFinder.cpp
Use.cpp		Use.cpp
User.cpp		User.cpp
		VPBuilder.cpp
Value.cpp		Value.cpp
ValueSymbolTable.cpp		ValueSymbolTable.cpp
Verifier.cpp		Verifier.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/IR		${LLVM_MAIN_INCLUDE_DIR}/llvm/IR

LINK_LIBS ${LLVM_PTHREAD_LIB}		LINK_LIBS ${LLVM_PTHREAD_LIB}

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

llvm/lib/IR/IRBuilder.cpp

Show First 20 Lines • Show All 505 Lines • ▼ Show 20 Lines	CallInst *IRBuilderBase::CreateMaskedIntrinsic(Intrinsic::ID Id,
ArrayRef<Value *> Ops,		ArrayRef<Value *> Ops,
ArrayRef<Type *> OverloadedTypes,		ArrayRef<Type *> OverloadedTypes,
const Twine &Name) {		const Twine &Name) {
Module *M = BB->getParent()->getParent();		Module *M = BB->getParent()->getParent();
Function *TheFn = Intrinsic::getDeclaration(M, Id, OverloadedTypes);		Function *TheFn = Intrinsic::getDeclaration(M, Id, OverloadedTypes);
return createCallHelper(TheFn, Ops, this, Name);		return createCallHelper(TheFn, Ops, this, Name);
}		}


		/// Create a call to a vector-predicated intrinsic (VP).
		/// \p OC - The LLVM IR Opcode of the operation
		/// \p VecOpArray - Intrinsic operand list
		/// \p FMFSource - Copy source for Fast Math Flags
		/// \p Name - name of the result variable
		Instruction *IRBuilderBase::CreateVectorPredicatedInst(
		unsigned OC, ArrayRef<Value > Params, Instruction FMFSource,
		const Twine &Name) {

		Module *M = BB->getParent()->getParent();

		using ShortTypeVec = VPIntrinsic::ShortTypeVec;
		using ShortValueVec = SmallVector<Value *, 4>;

		Intrinsic::ID VPID = VPIntrinsic::GetForOpcode(OC);
		auto VPFunc = VPIntrinsic::GetDeclarationForParams(M, VPID, Params);
		auto *VPCall = createCallHelper(VPFunc, Params, this, Name);

		// transfer fast math flags
		if (FMFSource && isa<FPMathOperator>(FMFSource)) {
		VPCall->copyFastMathFlags(FMFSource);
		}

		return VPCall;
		}

		/// Create a call to a vector-predicated comparison intrinsic (VP).
		/// \p Pred - comparison predicate
		/// \p FirstOp - First vector operand
		/// \p SndOp - Second vector operand
		/// \p Mask - Mask operand
		/// \p VectorLength - Vector length operand
		/// \p Name - name of the result variable
		Instruction *IRBuilderBase::CreateVectorPredicatedCmp(CmpInst::Predicate Pred,
		Value *FirstParam,
		Value SndParam, Value MaskParam,
		Value *VectorLengthParam,
		const Twine &Name) {

		Module *M = BB->getParent()->getParent();

		// encode comparison predicate as MD
		uint8_t RawPred = static_cast<uint8_t>(Pred);
		auto Int8Ty = Type::getInt8Ty(getContext());
		auto PredParam = ConstantInt::get(Int8Ty, RawPred, false);

		Intrinsic::ID VPID = FirstParam->getType()->isIntOrIntVectorTy()
		? Intrinsic::vp_icmp
		: Intrinsic::vp_fcmp;

		auto VPFunc = VPIntrinsic::GetDeclarationForParams(
		M, VPID, {FirstParam, SndParam, PredParam, MaskParam, VectorLengthParam});

		return createCallHelper(
		VPFunc, {FirstParam, SndParam, PredParam, MaskParam, VectorLengthParam},
		this, Name);
		}

/// Create a call to a Masked Gather intrinsic.		/// Create a call to a Masked Gather intrinsic.
/// \p Ptrs - vector of pointers for loading		/// \p Ptrs - vector of pointers for loading
/// \p Align - alignment for one element		/// \p Align - alignment for one element
/// \p Mask - vector of booleans which indicates what vector lanes should		/// \p Mask - vector of booleans which indicates what vector lanes should
/// be accessed in memory		/// be accessed in memory
/// \p PassThru - pass-through value that is used to fill the masked-off lanes		/// \p PassThru - pass-through value that is used to fill the masked-off lanes
/// of the result		/// of the result
/// \p Name - name of the result variable		/// \p Name - name of the result variable
▲ Show 20 Lines • Show All 237 Lines • Show Last 20 Lines

llvm/lib/IR/IntrinsicInst.cpp

Show All 15 Lines
//		//
// In some cases, arguments to intrinsics need to be generic and are defined as		// In some cases, arguments to intrinsics need to be generic and are defined as
// type pointer to empty struct { }*. To access the real item of interest the		// type pointer to empty struct { }*. To access the real item of interest the
// cast instruction needs to be stripped away.		// cast instruction needs to be stripped away.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Operator.h"
#include "llvm/ADT/StringSwitch.h"		#include "llvm/ADT/StringSwitch.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DebugInfoMetadata.h"		#include "llvm/IR/DebugInfoMetadata.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
		#include "llvm/IR/Operator.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
using namespace llvm;		using namespace llvm;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
/// DbgVariableIntrinsic - This is the common base class for debug info		/// DbgVariableIntrinsic - This is the common base class for debug info
/// intrinsics for variables.		/// intrinsics for variables.
///		///

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	Value *InstrProfIncrementInst::getStep() const {
if (InstrProfIncrementInstStep::classof(this)) {		if (InstrProfIncrementInstStep::classof(this)) {
return const_cast<Value *>(getArgOperand(4));		return const_cast<Value *>(getArgOperand(4));
}		}
const Module *M = getModule();		const Module *M = getModule();
LLVMContext &Context = M->getContext();		LLVMContext &Context = M->getContext();
return ConstantInt::get(Type::getInt64Ty(Context), 1);		return ConstantInt::get(Type::getInt64Ty(Context), 1);
}		}

Optional<ConstrainedFPIntrinsic::RoundingMode>		Optional<ExceptionBehavior>
ConstrainedFPIntrinsic::getRoundingMode() const {		llvm::StrToExceptionBehavior(StringRef ExceptionArg) {
unsigned NumOperands = getNumArgOperands();		return StringSwitch<Optional<ExceptionBehavior>>(ExceptionArg)
Metadata *MD =		.Case("fpexcept.ignore", ExceptionBehavior::ebIgnore)
cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();		.Case("fpexcept.maytrap", ExceptionBehavior::ebMayTrap)
if (!MD \|\| !isa<MDString>(MD))		.Case("fpexcept.strict", ExceptionBehavior::ebStrict)
return None;		.Default(None);
return StrToRoundingMode(cast<MDString>(MD)->getString());		}

		Optional<StringRef> llvm::ExceptionBehaviorToStr(ExceptionBehavior UseExcept) {
		Optional<StringRef> ExceptStr = None;
		switch (UseExcept) {
		default:
		break;
		case ExceptionBehavior::ebStrict:
		ExceptStr = "fpexcept.strict";
		break;
		case ExceptionBehavior::ebIgnore:
		ExceptStr = "fpexcept.ignore";
		break;
		case ExceptionBehavior::ebMayTrap:
		ExceptStr = "fpexcept.maytrap";
		break;
		}
		return ExceptStr;
}		}

Optional<ConstrainedFPIntrinsic::RoundingMode>		Optional<RoundingMode> llvm::StrToRoundingMode(StringRef RoundingArg) {
ConstrainedFPIntrinsic::StrToRoundingMode(StringRef RoundingArg) {
// For dynamic rounding mode, we use round to nearest but we will set the		// For dynamic rounding mode, we use round to nearest but we will set the
// 'exact' SDNodeFlag so that the value will not be rounded.		// 'exact' SDNodeFlag so that the value will not be rounded.
return StringSwitch<Optional<RoundingMode>>(RoundingArg)		return StringSwitch<Optional<RoundingMode>>(RoundingArg)
.Case("round.dynamic", rmDynamic)		.Case("round.dynamic", RoundingMode::rmDynamic)
.Case("round.tonearest", rmToNearest)		.Case("round.tonearest", RoundingMode::rmToNearest)
.Case("round.downward", rmDownward)		.Case("round.downward", RoundingMode::rmDownward)
.Case("round.upward", rmUpward)		.Case("round.upward", RoundingMode::rmUpward)
.Case("round.towardzero", rmTowardZero)		.Case("round.towardzero", RoundingMode::rmTowardZero)
.Default(None);		.Default(None);
}		}

Optional<StringRef>		Optional<StringRef> llvm::RoundingModeToStr(RoundingMode UseRounding) {
ConstrainedFPIntrinsic::RoundingModeToStr(RoundingMode UseRounding) {
Optional<StringRef> RoundingStr = None;		Optional<StringRef> RoundingStr = None;
switch (UseRounding) {		switch (UseRounding) {
case ConstrainedFPIntrinsic::rmDynamic:		default:
		break;
		case RoundingMode::rmDynamic:
RoundingStr = "round.dynamic";		RoundingStr = "round.dynamic";
break;		break;
case ConstrainedFPIntrinsic::rmToNearest:		case RoundingMode::rmToNearest:
RoundingStr = "round.tonearest";		RoundingStr = "round.tonearest";
break;		break;
case ConstrainedFPIntrinsic::rmDownward:		case RoundingMode::rmDownward:
RoundingStr = "round.downward";		RoundingStr = "round.downward";
break;		break;
case ConstrainedFPIntrinsic::rmUpward:		case RoundingMode::rmUpward:
RoundingStr = "round.upward";		RoundingStr = "round.upward";
break;		break;
case ConstrainedFPIntrinsic::rmTowardZero:		case RoundingMode::rmTowardZero:
RoundingStr = "round.towardzero";		RoundingStr = "round.towardzero";
break;		break;
}		}
return RoundingStr;		return RoundingStr;
}		}

Optional<ConstrainedFPIntrinsic::ExceptionBehavior>		/// Return the IR Value representation of any ExceptionBehavior.
ConstrainedFPIntrinsic::getExceptionBehavior() const {		Value *llvm::GetConstrainedFPExcept(LLVMContext &Context,
unsigned NumOperands = getNumArgOperands();		ExceptionBehavior UseExcept) {
Metadata *MD =		Optional<StringRef> ExceptStr = ExceptionBehaviorToStr(UseExcept);
cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();		assert(ExceptStr.hasValue() && "Garbage strict exception behavior!");
		auto *ExceptMDS = MDString::get(Context, ExceptStr.getValue());

		return MetadataAsValue::get(Context, ExceptMDS);
		}

		/// Return the IR Value representation of any RoundingMode.
		Value *llvm::GetConstrainedFPRounding(LLVMContext &Context,
		RoundingMode UseRounding) {
		Optional<StringRef> RoundingStr = RoundingModeToStr(UseRounding);
		assert(RoundingStr.hasValue() && "Garbage strict rounding mode!");
		auto *RoundingMDS = MDString::get(Context, RoundingStr.getValue());

		return MetadataAsValue::get(Context, RoundingMDS);
		}


		Optional<int32_t> VPIntrinsic::getStaticVectorLength() const {
		auto GetStaticVectorLengthOfType = [](const Type *T) -> Optional<int32_t> {
		auto VT = dyn_cast<VectorType>(T);
		if (!VT \|\| VT->isScalable())
		return None;

		// Corner case for excessive number of elements in the vector type
		auto Num = VT->getNumElements();
		if (Num >=
		static_cast<decltype(Num)>(std::numeric_limits<int32_t>::max())) {
		return std::numeric_limits<int32_t>::max();
		}

		return static_cast<int32_t>(Num);
		};

		auto VPMask = getMaskParam();
		if (VPMask) {
		return GetStaticVectorLengthOfType(VPMask->getType());
		}

		// only compose does not have a mask param
		assert(getIntrinsicID() == Intrinsic::vp_compose);
		return GetStaticVectorLengthOfType(getType());
		}

		void VPIntrinsic::setMaskParam(Value *NewMask) {
		auto MaskPos = GetMaskParamPos(getIntrinsicID());
		assert(MaskPos.hasValue());
		this->setOperand(MaskPos.getValue(), NewMask);
		}

		void VPIntrinsic::setVectorLengthParam(Value *NewVL) {
		auto VLPos = GetVectorLengthParamPos(getIntrinsicID());
		assert(VLPos.hasValue());
		this->setOperand(VLPos.getValue(), NewVL);
		}

		Value *VPIntrinsic::getMaskParam() const {
		auto maskPos = GetMaskParamPos(getIntrinsicID());
		if (maskPos)
		return getArgOperand(maskPos.getValue());
		return nullptr;
		}

		Value *VPIntrinsic::getVectorLengthParam() const {
		auto vlenPos = GetVectorLengthParamPos(getIntrinsicID());
		if (vlenPos)
		return getArgOperand(vlenPos.getValue());
		return nullptr;
		}

		Optional<int> VPIntrinsic::GetMaskParamPos(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return None;

		// general cmp
		case Intrinsic::vp_icmp:
		case Intrinsic::vp_fcmp:
		return 3;

		// int arith
		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:
		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:
		return 2;

		// memory
		case Intrinsic::vp_load:
		case Intrinsic::vp_gather:
		return 1;
		case Intrinsic::vp_store:
		case Intrinsic::vp_scatter:
		return 2;

		// shuffle
		case Intrinsic::vp_select:
		return 0;

		case Intrinsic::vp_compose:
		return None;

		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		return 1;
		case Intrinsic::vp_vshift:
		return 2;

		// fp arith
		case Intrinsic::vp_fneg:
		return 2;

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:
		return 4;

		case Intrinsic::vp_fma:
		return 5;

		case Intrinsic::vp_ceil:
		case Intrinsic::vp_cos:
		case Intrinsic::vp_exp2:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log:
		return 3;

		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		return 4;

		case Intrinsic::vp_nearbyint:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_round:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_trunc:
		return 3;

		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:
		return 2;

		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:
		return 3;

		// reductions
		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		return 1;

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		return 2;
		}
		}

		Optional<int> VPIntrinsic::GetVectorLengthParamPos(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		break;

		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		return 3;
		}

		auto maskPos = GetMaskParamPos(IntrinsicID);
		if (maskPos) {
		return maskPos.getValue() + 1;
		}
		return None;
		}

		Intrinsic::ID VPIntrinsic::GetForOpcode(unsigned OC) {
		switch (OC) {
		default:
		return Intrinsic::not_intrinsic;

		// fp unary
		case Instruction::FNeg:
		return Intrinsic::vp_fneg;

		// fp binary
		case Instruction::FAdd:
		return Intrinsic::vp_fadd;
		case Instruction::FSub:
		return Intrinsic::vp_fsub;
		case Instruction::FMul:
		return Intrinsic::vp_fmul;
		case Instruction::FDiv:
		return Intrinsic::vp_fdiv;
		case Instruction::FRem:
		return Intrinsic::vp_frem;

		// sign-oblivious int
		case Instruction::Add:
		return Intrinsic::vp_add;
		case Instruction::Sub:
		return Intrinsic::vp_sub;
		case Instruction::Mul:
		return Intrinsic::vp_mul;

		// signed/unsigned int
		case Instruction::SDiv:
		return Intrinsic::vp_sdiv;
		case Instruction::UDiv:
		return Intrinsic::vp_udiv;
		case Instruction::SRem:
		return Intrinsic::vp_srem;
		case Instruction::URem:
		return Intrinsic::vp_urem;

		// logical
		case Instruction::Or:
		return Intrinsic::vp_or;
		case Instruction::And:
		return Intrinsic::vp_and;
		case Instruction::Xor:
		return Intrinsic::vp_xor;

		case Instruction::LShr:
		return Intrinsic::vp_lshr;
		case Instruction::AShr:
		return Intrinsic::vp_ashr;
		case Instruction::Shl:
		return Intrinsic::vp_shl;

		// comparison
		case Instruction::ICmp:
		return Intrinsic::vp_icmp;
		case Instruction::FCmp:
		return Intrinsic::vp_fcmp;
		}
		}

		bool VPIntrinsic::canIgnoreVectorLengthParam() const {
		auto StaticVL = getStaticVectorLength();
		if (!StaticVL.hasValue())
		return false;

		auto *VLParam = getVectorLengthParam();
		assert(VLParam);

		// Check whether the vector length param is an out-of-range constant.
		auto VLConst = dyn_cast<ConstantInt>(VLParam);
		if (!VLConst)
		return false;
		int64_t VLNum = VLConst->getSExtValue();
		if (VLNum < 0 \|\| VLNum >= StaticVL.getValue())
		return true;

		return false;
		}

		CmpInst::Predicate VPIntrinsic::getCmpPredicate() const {
		return static_cast<CmpInst::Predicate>(
		cast<ConstantInt>(getArgOperand(2))->getZExtValue());
		}

		Optional<RoundingMode> VPIntrinsic::getRoundingMode() const {
		auto RmParamPos = GetRoundingModeParamPos(getIntrinsicID());
		if (!RmParamPos)
		return None;

		Metadata *MD = dyn_cast<MetadataAsValue>(getArgOperand(RmParamPos.getValue()))
		->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return None;		return None;
return StrToExceptionBehavior(cast<MDString>(MD)->getString());		StringRef RoundingArg = cast<MDString>(MD)->getString();
		return StrToRoundingMode(RoundingArg);
}		}

Optional<ConstrainedFPIntrinsic::ExceptionBehavior>		Optional<ExceptionBehavior> VPIntrinsic::getExceptionBehavior() const {
ConstrainedFPIntrinsic::StrToExceptionBehavior(StringRef ExceptionArg) {		auto EbParamPos = GetExceptionBehaviorParamPos(getIntrinsicID());
return StringSwitch<Optional<ExceptionBehavior>>(ExceptionArg)		if (!EbParamPos)
.Case("fpexcept.ignore", ebIgnore)		return None;
.Case("fpexcept.maytrap", ebMayTrap)
.Case("fpexcept.strict", ebStrict)		Metadata *MD = dyn_cast<MetadataAsValue>(getArgOperand(EbParamPos.getValue()))
.Default(None);		->getMetadata();
		if (!MD \|\| !isa<MDString>(MD))
		return None;
		StringRef ExceptionArg = cast<MDString>(MD)->getString();
		return StrToExceptionBehavior(ExceptionArg);
}		}

Optional<StringRef>		/// \return The vector to reduce if this is a reduction operation.
ConstrainedFPIntrinsic::ExceptionBehaviorToStr(ExceptionBehavior UseExcept) {		Value *VPIntrinsic::getReductionVectorParam() const {
Optional<StringRef> ExceptStr = None;		auto PosOpt = GetReductionVectorParamPos(getIntrinsicID());
switch (UseExcept) {		if (!PosOpt.hasValue())
case ConstrainedFPIntrinsic::ebStrict:		return nullptr;
ExceptStr = "fpexcept.strict";		return getArgOperand(PosOpt.getValue());
		}

		Optional<int> VPIntrinsic::GetReductionVectorParamPos(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return None;

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		return 0;

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		return 1;
		}
		}

		Optional<int> VPIntrinsic::GetReductionAccuParamPos(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return None;

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		return 0;
		}
		}

		/// \return The accumulator initial value if this is a reduction operation.
		Value *VPIntrinsic::getReductionAccuParam() const {
		auto PosOpt = GetReductionAccuParamPos(getIntrinsicID());
		if (!PosOpt.hasValue())
		return nullptr;
		return getArgOperand(PosOpt.getValue());
		}

		/// \return The pointer operand of this load,store, gather or scatter.
		Value *VPIntrinsic::getMemoryPointerParam() const {
		auto PtrParamOpt = GetMemoryPointerParamPos(getIntrinsicID());
		if (!PtrParamOpt.hasValue())
		return nullptr;
		return getArgOperand(PtrParamOpt.getValue());
		}

		Optional<int> VPIntrinsic::GetMemoryPointerParamPos(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return None;

		case Intrinsic::vp_load:
		return 0;
		case Intrinsic::vp_gather:
		return 0;
		case Intrinsic::vp_store:
		return 1;
		case Intrinsic::vp_scatter:
		return 1;
		}
		}

		/// \return The data (payload) operand of this store or scatter.
		Value *VPIntrinsic::getMemoryDataParam() const {
		auto DataParamOpt = GetMemoryDataParamPos(getIntrinsicID());
		if (!DataParamOpt.hasValue())
		return nullptr;
		return getArgOperand(DataParamOpt.getValue());
		}

		Optional<int> VPIntrinsic::GetMemoryDataParamPos(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return None;

		case Intrinsic::vp_store:
		return 0;
		case Intrinsic::vp_scatter:
		return 0;
		}
		}

		Function VPIntrinsic::GetDeclarationForParams(Module M, Intrinsic::ID VPID,
		ArrayRef<Value *> Params) {
		assert(VPID != Intrinsic::not_intrinsic && "todo dispatch to default insts");

		bool IsArithOp = VPIntrinsic::IsBinaryVPOp(VPID) \|\|
		VPIntrinsic::IsUnaryVPOp(VPID) \|\|
		VPIntrinsic::IsTernaryVPOp(VPID);
		bool IsCmpOp = (VPID == Intrinsic::vp_icmp) \|\| (VPID == Intrinsic::vp_fcmp);
		bool IsReduceOp = VPIntrinsic::IsVPReduction(VPID);
		bool IsShuffleOp =
		(VPID == Intrinsic::vp_compress) \|\| (VPID == Intrinsic::vp_expand) \|\|
		(VPID == Intrinsic::vp_vshift) \|\| (VPID == Intrinsic::vp_select) \|\|
		(VPID == Intrinsic::vp_compose);
		bool IsMemoryOp =
		(VPID == Intrinsic::vp_store) \|\| (VPID == Intrinsic::vp_load) \|\|
		(VPID == Intrinsic::vp_store) \|\| (VPID == Intrinsic::vp_load);

		Type *VecTy = nullptr;
		Type *VecRetTy = nullptr;
		Type *VecPtrTy = nullptr;

		if (IsArithOp \|\| IsCmpOp) {
		Value &FirstOp = *Params[0];

		// Fetch the VP intrinsic
		VecTy = cast<VectorType>(FirstOp.getType());
		VecRetTy = VecTy;

		} else if (IsReduceOp) {
		auto VectorPosOpt = GetReductionVectorParamPos(VPID);
		auto AccuPosOpt = GetReductionAccuParamPos(VPID);
		Value *VectorParam = Params[VectorPosOpt.getValue()];

		VecTy = VectorParam->getType();

		if (AccuPosOpt.hasValue()) {
		Value *AccuParam = Params[AccuPosOpt.getValue()];
		VecRetTy = AccuParam->getType();
		} else {
		VecRetTy = VecTy;
		}

		} else if (IsMemoryOp) {
		auto DataPosOpt = VPIntrinsic::GetMemoryDataParamPos(VPID);
		auto PtrPosOpt = VPIntrinsic::GetMemoryPointerParamPos(VPID);
		VecPtrTy = Params[PtrPosOpt.getValue()]->getType();

		if (DataPosOpt.hasValue()) {
		// store-kind operation
		VecTy = Params[DataPosOpt.getValue()]->getType();
		} else {
		// load-kind operation
		VecTy = VecPtrTy->getPointerElementType();
		}

		} else if (IsShuffleOp) {
		VecTy = (VPID == Intrinsic::vp_select) ? Params[1]->getType()
		: Params[0]->getType();
		VecRetTy = VecTy;
		}

		auto TypeTokens = VPIntrinsic::GetTypeTokens(VPID);
		auto *VPFunc = Intrinsic::getDeclaration(
		M, VPID,
		VPIntrinsic::EncodeTypeTokens(TypeTokens, VecTy, VecPtrTy, *VecTy));
		assert(VPFunc && "not a VP intrinsic");

		return VPFunc;
		}

		VPIntrinsic::TypeTokenVec VPIntrinsic::GetTypeTokens(Intrinsic::ID ID) {
		switch (ID) {
		default:
		llvm_unreachable("not implemented!");

		case Intrinsic::vp_cos:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_exp2:

		case Intrinsic::vp_log:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_ceil:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_round:
		case Intrinsic::vp_trunc:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_nearbyint:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		return TypeTokenVec{VPTypeToken::Returned};

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:

		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		return TypeTokenVec{VPTypeToken::Vector};

		case Intrinsic::vp_gather:
		case Intrinsic::vp_load:
		return TypeTokenVec{VPTypeToken::Returned, VPTypeToken::Pointer};

		case Intrinsic::vp_scatter:
		case Intrinsic::vp_store:
		return TypeTokenVec{VPTypeToken::Pointer, VPTypeToken::Vector};

		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:
		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:
		case Intrinsic::vp_lrint:
		case Intrinsic::vp_llrint:
		return TypeTokenVec{VPTypeToken::Returned, VPTypeToken::Vector};

		case Intrinsic::vp_icmp:
		case Intrinsic::vp_fcmp:
		return TypeTokenVec{VPTypeToken::Mask, VPTypeToken::Vector};
		}
		}

		bool VPIntrinsic::isReductionOp() const {
		return IsVPReduction(getIntrinsicID());
		}

		bool VPIntrinsic::IsVPReduction(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return false;

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:

		return true;
		}
		}

		bool VPIntrinsic::isConstrainedOp() const {
		return (getRoundingMode() != None &&
		getRoundingMode() != RoundingMode::rmToNearest) \|\|
		(getExceptionBehavior() != None &&
		getExceptionBehavior() != ExceptionBehavior::ebIgnore);
		}

		bool VPIntrinsic::isUnaryOp() const { return IsUnaryVPOp(getIntrinsicID()); }

		bool VPIntrinsic::IsUnaryVPOp(Intrinsic::ID VPID) {
		return VPID == Intrinsic::vp_fneg;
		}

		bool VPIntrinsic::isBinaryOp() const { return IsBinaryVPOp(getIntrinsicID()); }

		bool VPIntrinsic::IsBinaryVPOp(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return false;

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:
		return true;
		}
		}

		bool VPIntrinsic::IsTernaryVPOp(Intrinsic::ID VPID) {
		switch (VPID) {
		default:
		return false;

		case Intrinsic::vp_fma:
		return true;
		}
		}

		bool VPIntrinsic::isTernaryOp() const {
		return IsTernaryVPOp(getIntrinsicID());
		}

		Optional<int>
		VPIntrinsic::GetExceptionBehaviorParamPos(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return None;

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:
		return 3;

		case Intrinsic::vp_fma:
		return 4;

		case Intrinsic::vp_ceil:
		case Intrinsic::vp_cos:
		case Intrinsic::vp_exp2:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log:
		return 2;

		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		return 3;
		case Intrinsic::vp_nearbyint:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_round:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_trunc:
		return 2;

		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:
		return 2;

		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:
		return 1;
		}
		}

		Optional<int> VPIntrinsic::GetRoundingModeParamPos(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return None;

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:
		return 2;

		case Intrinsic::vp_fma:
		return 3;

		case Intrinsic::vp_ceil:
		case Intrinsic::vp_cos:
		case Intrinsic::vp_exp2:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log:
		return 1;

		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		return 2;
		case Intrinsic::vp_nearbyint:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_round:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_trunc:
		return 1;

		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:
		return None;

		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:
		return 2;
		}
		}

		Intrinsic::ID
		VPIntrinsic::GetForConstrainedIntrinsic(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return Intrinsic::not_intrinsic;

		// llvm.experimental.constrained.*
		case Intrinsic::experimental_constrained_cos:
		return Intrinsic::vp_cos;
		case Intrinsic::experimental_constrained_sin:
		return Intrinsic::vp_sin;
		case Intrinsic::experimental_constrained_exp:
		return Intrinsic::vp_exp;
		case Intrinsic::experimental_constrained_exp2:
		return Intrinsic::vp_exp2;
		case Intrinsic::experimental_constrained_log:
		return Intrinsic::vp_log;
		case Intrinsic::experimental_constrained_log2:
		return Intrinsic::vp_log2;
		case Intrinsic::experimental_constrained_log10:
		return Intrinsic::vp_log10;
		case Intrinsic::experimental_constrained_sqrt:
		return Intrinsic::vp_sqrt;
		case Intrinsic::experimental_constrained_ceil:
		return Intrinsic::vp_ceil;
		case Intrinsic::experimental_constrained_floor:
		return Intrinsic::vp_floor;
		case Intrinsic::experimental_constrained_round:
		return Intrinsic::vp_round;
		case Intrinsic::experimental_constrained_trunc:
		return Intrinsic::vp_trunc;
		case Intrinsic::experimental_constrained_rint:
		return Intrinsic::vp_rint;
		case Intrinsic::experimental_constrained_nearbyint:
		return Intrinsic::vp_nearbyint;

		case Intrinsic::experimental_constrained_fadd:
		return Intrinsic::vp_fadd;
		case Intrinsic::experimental_constrained_fsub:
		return Intrinsic::vp_fsub;
		case Intrinsic::experimental_constrained_fmul:
		return Intrinsic::vp_fmul;
		case Intrinsic::experimental_constrained_fdiv:
		return Intrinsic::vp_fdiv;
		case Intrinsic::experimental_constrained_frem:
		return Intrinsic::vp_frem;
		case Intrinsic::experimental_constrained_pow:
		return Intrinsic::vp_pow;
		case Intrinsic::experimental_constrained_powi:
		return Intrinsic::vp_powi;
		case Intrinsic::experimental_constrained_maxnum:
		return Intrinsic::vp_maxnum;
		case Intrinsic::experimental_constrained_minnum:
		return Intrinsic::vp_minnum;

		case Intrinsic::experimental_constrained_fma:
		return Intrinsic::fma;
		}
		}

		VPIntrinsic::ShortTypeVec
		VPIntrinsic::EncodeTypeTokens(VPIntrinsic::TypeTokenVec TTVec, Type *VecRetTy,
		Type *VecPtrTy, Type &VectorTy) {
		ShortTypeVec STV;

		for (auto Token : TTVec) {
		switch (Token) {
		default:
		llvm_unreachable("unsupported token"); // unsupported VPTypeToken

		case VPIntrinsic::VPTypeToken::Vector:
		STV.push_back(&VectorTy);
break;		break;
case ConstrainedFPIntrinsic::ebIgnore:		case VPIntrinsic::VPTypeToken::Pointer:
ExceptStr = "fpexcept.ignore";		STV.push_back(VecPtrTy);
break;		break;
case ConstrainedFPIntrinsic::ebMayTrap:		case VPIntrinsic::VPTypeToken::Returned:
ExceptStr = "fpexcept.maytrap";		assert(VecRetTy);
		STV.push_back(VecRetTy);
		break;
		case VPIntrinsic::VPTypeToken::Mask:
		auto NumElems = VectorTy.getVectorNumElements();
		auto MaskTy =
		VectorType::get(Type::getInt1Ty(VectorTy.getContext()), NumElems);
		STV.push_back(MaskTy);
break;		break;
}		}
return ExceptStr;
}		}

		return STV;
		}

		Optional<RoundingMode> ConstrainedFPIntrinsic::getRoundingMode() const {
		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 2 && "underflow");
		Metadata *MD =
		cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();
		if (!MD \|\| !isa<MDString>(MD))
		return None;
		return StrToRoundingMode(cast<MDString>(MD)->getString());
		}

		Optional<ExceptionBehavior>
		ConstrainedFPIntrinsic::getExceptionBehavior() const {
		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 1 && "underflow");
		Metadata *MD =
		cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();
		if (!MD \|\| !isa<MDString>(MD))
		return None;
		return StrToExceptionBehavior(cast<MDString>(MD)->getString());
		}


bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::experimental_constrained_fptosi:		case Intrinsic::experimental_constrained_fptosi:
case Intrinsic::experimental_constrained_fptoui:		case Intrinsic::experimental_constrained_fptoui:
case Intrinsic::experimental_constrained_fptrunc:		case Intrinsic::experimental_constrained_fptrunc:
case Intrinsic::experimental_constrained_fpext:		case Intrinsic::experimental_constrained_fpext:
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/lib/IR/PredicatedInst.cpp

This file was added.

				#include <llvm/IR/InstrTypes.h>
				#include <llvm/IR/Instruction.h>
				#include <llvm/IR/Instructions.h>
				#include <llvm/IR/IntrinsicInst.h>
				#include <llvm/IR/PredicatedInst.h>

				namespace {
				using namespace llvm;
				using ShortValueVec = SmallVector<Value *, 4>;
				} // namespace

				namespace llvm {

				bool PredicatedInstruction::canIgnoreVectorLengthParam() const {
				auto VPI = dyn_cast<VPIntrinsic>(this);
				if (!VPI)
				return true;

				return VPI->canIgnoreVectorLengthParam();
				}

				FastMathFlags PredicatedInstruction::getFastMathFlags() const {
				return cast<Instruction>(this)->getFastMathFlags();
				}

				void PredicatedOperator::copyIRFlags(const Value *V, bool IncludeWrapFlags) {
				auto *I = dyn_cast<Instruction>(this);
				if (I)
				I->copyIRFlags(V, IncludeWrapFlags);
				}

				Instruction *PredicatedBinaryOperator::Create(
				Module Mod, Value Mask, Value *VectorLen, Instruction::BinaryOps Opc,
				Value V1, Value V2, const Twine &Name, BasicBlock *InsertAtEnd,
				Instruction *InsertBefore) {
				assert(!(InsertAtEnd && InsertBefore));
				auto VPID = VPIntrinsic::GetForOpcode(Opc);

				// Default Code Path
				if ((!Mod \|\| (!Mask && !VectorLen)) \|\| VPID == Intrinsic::not_intrinsic) {
				if (InsertAtEnd) {
				return BinaryOperator::Create(Opc, V1, V2, Name, InsertAtEnd);
				} else {
				return BinaryOperator::Create(Opc, V1, V2, Name, InsertBefore);
				}
				}

				assert(Mod && "Need a module to emit VP Intrinsics");

				// Fetch the VP intrinsic
				auto &VecTy = cast<VectorType>(*V1->getType());
				auto TypeTokens = VPIntrinsic::GetTypeTokens(VPID);
				auto *VPFunc = Intrinsic::getDeclaration(
				Mod, VPID,
				VPIntrinsic::EncodeTypeTokens(TypeTokens, &VecTy, nullptr, VecTy));

				// Encode default environment fp behavior
				LLVMContext &Ctx = V1->getContext();
				SmallVector<Value *, 6> BinOpArgs({V1, V2});
				if (VPIntrinsic::HasRoundingModeParam(VPID)) {
				BinOpArgs.push_back(
				GetConstrainedFPRounding(Ctx, RoundingMode::rmToNearest));
				}
				if (VPIntrinsic::HasExceptionBehaviorParam(VPID)) {
				BinOpArgs.push_back(
				GetConstrainedFPExcept(Ctx, ExceptionBehavior::ebIgnore));
				}

				BinOpArgs.push_back(Mask);
				BinOpArgs.push_back(VectorLen);

				CallInst *CI;
				if (InsertAtEnd) {
				CI = CallInst::Create(VPFunc, BinOpArgs, Name, InsertAtEnd);
				} else {
				CI = CallInst::Create(VPFunc, BinOpArgs, Name, InsertBefore);
				}

				// the VP inst does not touch memory if the exception behavior is
				// "fpecept.ignore"
				CI->setDoesNotAccessMemory();
				return CI;
				}

				} // namespace llvm

llvm/lib/IR/VPBuilder.cpp

This file was added.

				#include <llvm/IR/VPBuilder.h>
				#include <llvm/IR/Intrinsics.h>
				#include <llvm/IR/Instructions.h>
				#include <llvm/IR/PredicatedInst.h>
				#include <llvm/ADT/SmallVector.h>

				namespace {
				using namespace llvm;
				using ShortTypeVec = VPIntrinsic::ShortTypeVec;
				using ShortValueVec = SmallVector<Value*, 4>;
				}

				namespace llvm {

				Module &
				VPBuilder::getModule() const {
				return *Builder.GetInsertBlock()->getParent()->getParent();
				}

				Value&
				VPBuilder::GetMaskForType(VectorType & VecTy) {
				if (Mask) return *Mask;

				auto * boolTy = Builder.getInt1Ty();
				auto * maskTy = VectorType::get(boolTy, StaticVectorLength);
				return *ConstantInt::getAllOnesValue(maskTy);
				}

				Value&
				VPBuilder::GetEVLForType(VectorType & VecTy) {
				if (ExplicitVectorLength) return *ExplicitVectorLength;

				auto * intTy = Builder.getInt32Ty();
				return *ConstantInt::get(intTy, StaticVectorLength);
				}

				Value*
				VPBuilder::CreateVectorCopy(Instruction & Inst, ValArray VecOpArray) {
				auto OC = Inst.getOpcode();
				auto VPID = VPIntrinsic::GetForOpcode(OC);
				if (VPID == Intrinsic::not_intrinsic) {
				return nullptr;
				}

				abort(); // TODO implement

				return nullptr;
				}


				VectorType&
				VPBuilder::getVectorType(Type &ElementTy) {
				return *VectorType::get(&ElementTy, StaticVectorLength);
				}

				Value&
				VPBuilder::CreateContiguousStore(Value & Val, Value & Pointer, Align Alignment) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * StoreFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_store, {Val.getType(), Pointer.getType()});
				ShortValueVec Args{&Val, &Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &StoreCall = *Builder.CreateCall(StoreFunc, Args);
				if (Alignment != None) StoreCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return StoreCall;
				}

				Value&
				VPBuilder::CreateContiguousLoad(Value & Pointer, Align Alignment) {
				auto & PointerTy = cast<PointerType>(*Pointer.getType());
				auto & VecTy = getVectorType(*PointerTy.getPointerElementType());

				auto * LoadFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_load, {&VecTy, &PointerTy});
				ShortValueVec Args{&Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &LoadCall= *Builder.CreateCall(LoadFunc, Args);
				if (Alignment != None) LoadCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return LoadCall;
				}

				Value&
				VPBuilder::CreateScatter(Value & Val, Value & PointerVec, Align Alignment) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * ScatterFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_scatter, {Val.getType(), PointerVec.getType()});
				ShortValueVec Args{&Val, &PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &ScatterCall = *Builder.CreateCall(ScatterFunc, Args);
				if (Alignment != None) ScatterCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return ScatterCall;
				}

				Value&
				VPBuilder::CreateGather(Value & PointerVec, Align Alignment) {
				auto & PointerVecTy = cast<VectorType>(*PointerVec.getType());
				auto & ElemTy = cast<PointerType>(PointerVecTy.getVectorElementType()).getPointerElementType();
				auto & VecTy = *VectorType::get(&ElemTy, PointerVecTy.getNumElements());
				auto * GatherFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_gather, {&VecTy, &PointerVecTy});

				ShortValueVec Args{&PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &GatherCall = *Builder.CreateCall(GatherFunc, Args);
				if (Alignment != None) GatherCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return GatherCall;
				}

				} // namespace llvm

llvm/lib/IR/Verifier.cpp

Show First 20 Lines • Show All 469 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
void visitSwitchInst(SwitchInst &SI);		void visitSwitchInst(SwitchInst &SI);
void visitIndirectBrInst(IndirectBrInst &BI);		void visitIndirectBrInst(IndirectBrInst &BI);
void visitCallBrInst(CallBrInst &CBI);		void visitCallBrInst(CallBrInst &CBI);
void visitSelectInst(SelectInst &SI);		void visitSelectInst(SelectInst &SI);
void visitUserOp1(Instruction &I);		void visitUserOp1(Instruction &I);
void visitUserOp2(Instruction &I) { visitUserOp1(I); }		void visitUserOp2(Instruction &I) { visitUserOp1(I); }
void visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call);		void visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call);
void visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI);		void visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI);
		void visitVPIntrinsic(VPIntrinsic &FPI);
void visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII);		void visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII);
void visitDbgLabelIntrinsic(StringRef Kind, DbgLabelInst &DLI);		void visitDbgLabelIntrinsic(StringRef Kind, DbgLabelInst &DLI);
void visitAtomicCmpXchgInst(AtomicCmpXchgInst &CXI);		void visitAtomicCmpXchgInst(AtomicCmpXchgInst &CXI);
void visitAtomicRMWInst(AtomicRMWInst &RMWI);		void visitAtomicRMWInst(AtomicRMWInst &RMWI);
void visitFenceInst(FenceInst &FI);		void visitFenceInst(FenceInst &FI);
void visitAllocaInst(AllocaInst &AI);		void visitAllocaInst(AllocaInst &AI);
void visitExtractValueInst(ExtractValueInst &EVI);		void visitExtractValueInst(ExtractValueInst &EVI);
void visitInsertValueInst(InsertValueInst &IVI);		void visitInsertValueInst(InsertValueInst &IVI);
▲ Show 20 Lines • Show All 1,198 Lines • ▼ Show 20 Lines

// Check parameter attributes against a function type.		// Check parameter attributes against a function type.
// The value V is printed in error messages.		// The value V is printed in error messages.
void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,		void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,
const Value *V, bool IsIntrinsic) {		const Value *V, bool IsIntrinsic) {
if (Attrs.isEmpty())		if (Attrs.isEmpty())
return;		return;

		bool SawMask = false;
bool SawNest = false;		bool SawNest = false;
		bool SawPassthru = false;
bool SawReturned = false;		bool SawReturned = false;
bool SawSRet = false;		bool SawSRet = false;
bool SawSwiftSelf = false;		bool SawSwiftSelf = false;
bool SawSwiftError = false;		bool SawSwiftError = false;
		bool SawVectorLength = false;

// Verify return value attributes.		// Verify return value attributes.
AttributeSet RetAttrs = Attrs.getRetAttributes();		AttributeSet RetAttrs = Attrs.getRetAttributes();
Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&		Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&
!RetAttrs.hasAttribute(Attribute::Nest) &&		!RetAttrs.hasAttribute(Attribute::Nest) &&
!RetAttrs.hasAttribute(Attribute::StructRet) &&		!RetAttrs.hasAttribute(Attribute::StructRet) &&
!RetAttrs.hasAttribute(Attribute::NoCapture) &&		!RetAttrs.hasAttribute(Attribute::NoCapture) &&
!RetAttrs.hasAttribute(Attribute::NoFree) &&		!RetAttrs.hasAttribute(Attribute::NoFree) &&
▲ Show 20 Lines • Show All 52 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = FT->getNumParams(); i != e; ++i) {
}		}

if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {		if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {
Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",		Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",
V);		V);
SawSwiftError = true;		SawSwiftError = true;
}		}

		if (ArgAttrs.hasAttribute(Attribute::VectorLength)) {
		Assert(!SawVectorLength, "Cannot have multiple 'vlen' parameters!",
		V);
		SawVectorLength = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Passthru)) {
		Assert(!SawPassthru, "Cannot have multiple 'passthru' parameters!",
		V);
		SawPassthru = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Mask)) {
		Assert(!SawMask, "Cannot have multiple 'mask' parameters!",
		V);
		SawMask = true;
		}

if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {		if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {
Assert(i == FT->getNumParams() - 1,		Assert(i == FT->getNumParams() - 1,
"inalloca isn't on the last parameter!", V);		"inalloca isn't on the last parameter!", V);
}		}
}		}

		Assert(!SawPassthru \|\| SawMask,
		"Cannot have 'passthru' parameter without 'mask' parameter!", V);

if (!Attrs.hasAttributes(AttributeList::FunctionIndex))		if (!Attrs.hasAttributes(AttributeList::FunctionIndex))
return;		return;

verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);		verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);

Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&		Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&
Attrs.hasFnAttribute(Attribute::ReadOnly)),		Attrs.hasFnAttribute(Attribute::ReadOnly)),
"Attributes 'readnone and readonly' are incompatible!", V);		"Attributes 'readnone and readonly' are incompatible!", V);
▲ Show 20 Lines • Show All 1,349 Lines • ▼ Show 20 Lines	Assert(
&II);		&II);

visitTerminator(II);		visitTerminator(II);
}		}

/// visitUnaryOperator - Check the argument to the unary operator.		/// visitUnaryOperator - Check the argument to the unary operator.
///		///
void Verifier::visitUnaryOperator(UnaryOperator &U) {		void Verifier::visitUnaryOperator(UnaryOperator &U) {
Assert(U.getType() == U.getOperand(0)->getType(),		Assert(U.getType() == U.getOperand(0)->getType(),
"Unary operators must have same type for"		"Unary operators must have same type for"
"operands and result!",		"operands and result!",
&U);		&U);

switch (U.getOpcode()) {		switch (U.getOpcode()) {
// Check that floating-point arithmetic operators are only used with		// Check that floating-point arithmetic operators are only used with
// floating-point operands.		// floating-point operands.
case Instruction::FNeg:		case Instruction::FNeg:
▲ Show 20 Lines • Show All 1,184 Lines • ▼ Show 20 Lines	void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_lround:		case Intrinsic::experimental_constrained_lround:
case Intrinsic::experimental_constrained_llround:		case Intrinsic::experimental_constrained_llround:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));
break;		break;

		// general cmp
		case Intrinsic::vp_icmp:
		case Intrinsic::vp_fcmp:

		// int arith
		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:
		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		// memory
		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		// shuffle
		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		// fp arith
		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:
		case Intrinsic::vp_ceil:
		case Intrinsic::vp_cos:
		case Intrinsic::vp_exp2:
		case Intrinsic::vp_exp:
		case Intrinsic::vp_floor:
		case Intrinsic::vp_log10:
		case Intrinsic::vp_log2:
		case Intrinsic::vp_log:
		case Intrinsic::vp_maxnum:
		case Intrinsic::vp_minnum:
		case Intrinsic::vp_nearbyint:
		case Intrinsic::vp_pow:
		case Intrinsic::vp_powi:
		case Intrinsic::vp_rint:
		case Intrinsic::vp_round:
		case Intrinsic::vp_sin:
		case Intrinsic::vp_sqrt:
		case Intrinsic::vp_trunc:
		case Intrinsic::vp_fptoui:
		case Intrinsic::vp_fptosi:
		case Intrinsic::vp_fpext:
		case Intrinsic::vp_fptrunc:

		case Intrinsic::vp_lround:
		case Intrinsic::vp_llround:

		// reductions
		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		visitVPIntrinsic(cast<VPIntrinsic>(Call));
		break;

case Intrinsic::dbg_declare: // llvm.dbg.declare		case Intrinsic::dbg_declare: // llvm.dbg.declare
Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),		Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),
"invalid llvm.dbg.declare intrinsic call 1", Call);		"invalid llvm.dbg.declare intrinsic call 1", Call);
visitDbgIntrinsic("declare", cast<DbgVariableIntrinsic>(Call));		visitDbgIntrinsic("declare", cast<DbgVariableIntrinsic>(Call));
break;		break;
case Intrinsic::dbg_addr: // llvm.dbg.addr		case Intrinsic::dbg_addr: // llvm.dbg.addr
visitDbgIntrinsic("addr", cast<DbgVariableIntrinsic>(Call));		visitDbgIntrinsic("addr", cast<DbgVariableIntrinsic>(Call));
break;		break;
▲ Show 20 Lines • Show All 407 Lines • ▼ Show 20 Lines	static DISubprogram getSubprogram(Metadata LocalScope) {
if (auto *LB = dyn_cast<DILexicalBlockBase>(LocalScope))		if (auto *LB = dyn_cast<DILexicalBlockBase>(LocalScope))
return getSubprogram(LB->getRawScope());		return getSubprogram(LB->getRawScope());

// Just return null; broken scope chains are checked elsewhere.		// Just return null; broken scope chains are checked elsewhere.
assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");		assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");
return nullptr;		return nullptr;
}		}

		void Verifier::visitVPIntrinsic(VPIntrinsic &VPI) {
		Assert(!VPI.isConstrainedOp(), "VP intrinsics only support the default fp environment for now (round.tonearest; fpexcept.ignore).");
		if (VPI.isConstrainedOp()) {
		Assert(VPI.getExceptionBehavior() != ExceptionBehavior::ebInvalid,
		"invalid exception behavior argument", &VPI);
		Assert(VPI.getRoundingMode() != RoundingMode::rmInvalid,
		"invalid rounding mode argument", &VPI);
		}
		}

void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {		void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
unsigned NumOperands = FPI.getNumArgOperands();		unsigned NumOperands = FPI.getNumArgOperands();
bool HasExceptionMD = false;		bool HasExceptionMD = false;
bool HasRoundingMD = false;		bool HasRoundingMD = false;
switch (FPI.getIntrinsicID()) {		switch (FPI.getIntrinsicID()) {
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
▲ Show 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	struct VerifierLegacyPass : public FunctionPass {
bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
V = std::make_unique<Verifier>(		V = std::make_unique<Verifier>(
&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);		&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);
return false;		return false;
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (!V->verify(F) && FatalErrors) {		if (!V->verify(F) && FatalErrors) {
errs() << "in function " << F.getName() << '\n';		errs() << "in function " << F.getName() << '\n';
report_fatal_error("Broken function found, compilation aborted!");		report_fatal_error("Broken function found, compilation aborted!");
}		}
return false;		return false;
}		}

bool doFinalization(Module &M) override {		bool doFinalization(Module &M) override {
bool HasErrors = false;		bool HasErrors = false;
for (Function &F : M)		for (Function &F : M)
▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show All 18 Lines
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/IR/PredicatedInst.h"
		#include "llvm/IR/VPBuilder.h"
		#include "llvm/IR/MatcherCast.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/AlignOf.h"		#include "llvm/Support/AlignOf.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>

▲ Show 20 Lines • Show All 2,047 Lines • ▼ Show 20 Lines	if (I.hasNoSignedZeros() &&
return BinaryOperator::CreateFSubFMF(Y, X, &I);		return BinaryOperator::CreateFSubFMF(Y, X, &I);

if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))		if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))
return R;		return R;

return nullptr;		return nullptr;
}		}

		Instruction *InstCombiner::visitPredicatedFSub(PredicatedBinaryOperator& I) {
		auto * Inst = cast<Instruction>(&I);
		PredicatedContext PC(&I);
		if (Value *V = SimplifyPredicatedFSubInst(I.getOperand(0), I.getOperand(1),
		I.getFastMathFlags(),
		SQ.getWithInstruction(Inst), PC))
		return replaceInstUsesWith(*Inst, V);

		return visitFSubGeneric<Instruction, PredicatedContext>(*Inst);
		}

Instruction *InstCombiner::visitFSub(BinaryOperator &I) {		Instruction *InstCombiner::visitFSub(BinaryOperator &I) {
if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

		return visitFSubGeneric<BinaryOperator, EmptyContext>(I);
		}

		template<typename BinaryOpTy, typename MatchContextType>
		Instruction *InstCombiner::visitFSubGeneric(BinaryOpTy &I) {
		MatchContextType MC(cast<Value>(&I));
		MatchContextBuilder<MatchContextType> MCBuilder(MC);

// Subtraction from -0.0 is the canonical form of fneg.		// Subtraction from -0.0 is the canonical form of fneg.
// fsub nsz 0, X ==> fsub nsz -0.0, X		// fsub nsz 0, X ==> fsub nsz -0.0, X
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
if (I.hasNoSignedZeros() && match(Op0, m_PosZeroFP()))		if (I.hasNoSignedZeros() && MC.try_match(Op0, m_PosZeroFP()))
return BinaryOperator::CreateFNegFMF(Op1, &I);		return MCBuilder.CreateFNegFMF(Op1, &I);

if (Instruction *X = foldFNegIntoConstant(I))		if (Instruction *X = foldFNegIntoConstant(I))
return X;		return X;

if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))		if (Instruction *R = hoistFNegAboveFMulFDiv(I, Builder))
return R;		return R;

Value X, Y;		Value X, Y;
Constant *C;		Constant *C;

		// Fold negation into constant operand. This is limited with one-use because
		// fneg is assumed better for analysis and cheaper in codegen than fmul/fdiv.
		// -(X * C) --> X * (-C)
		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FMul(m_Value(X), m_Constant(C))))))
		return MCBuilder.CreateFMulFMF(X, ConstantExpr::getFNeg(C), &I);
		// -(X / C) --> X / (-C)
		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FDiv(m_Value(X), m_Constant(C))))))
		return MCBuilder.CreateFDivFMF(X, ConstantExpr::getFNeg(C), &I);
		// -(C / X) --> (-C) / X
		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FDiv(m_Constant(C), m_Value(X))))))
		return MCBuilder.CreateFDivFMF(ConstantExpr::getFNeg(C), X, &I);

// If Op0 is not -0.0 or we can ignore -0.0: Z - (X - Y) --> Z + (Y - X)		// If Op0 is not -0.0 or we can ignore -0.0: Z - (X - Y) --> Z + (Y - X)
// Canonicalize to fadd to make analysis easier.		// Canonicalize to fadd to make analysis easier.
// This can also help codegen because fadd is commutative.		// This can also help codegen because fadd is commutative.
// Note that if this fsub was really an fneg, the fadd with -0.0 will get		// Note that if this fsub was really an fneg, the fadd with -0.0 will get
// killed later. We still limit that particular transform with 'hasOneUse'		// killed later. We still limit that particular transform with 'hasOneUse'
// because an fneg is assumed better/cheaper than a generic fsub.		// because an fneg is assumed better/cheaper than a generic fsub.
if (I.hasNoSignedZeros() \|\| CannotBeNegativeZero(Op0, SQ.TLI)) {		if (I.hasNoSignedZeros() \|\| CannotBeNegativeZero(Op0, SQ.TLI)) {
if (match(Op1, m_OneUse(m_FSub(m_Value(X), m_Value(Y))))) {		if (MC.try_match(Op1, m_OneUse(m_FSub(m_Value(X), m_Value(Y))))) {
Value *NewSub = Builder.CreateFSubFMF(Y, X, &I);		Value *NewSub = MCBuilder.CreateFSubFMF(Builder, Y, X, &I);
return BinaryOperator::CreateFAddFMF(Op0, NewSub, &I);		return MCBuilder.CreateFAddFMF(Op0, NewSub, &I);
}		}
}		}

		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I))
if (isa<Constant>(Op0))		if (isa<Constant>(Op0))
if (SelectInst *SI = dyn_cast<SelectInst>(Op1))		if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
if (Instruction *NV = FoldOpIntoSelect(I, SI))		if (Instruction NV = FoldOpIntoSelect(PlainBinOp, SI))
return NV;		return NV;

// X - C --> X + (-C)		// X - C --> X + (-C)
// But don't transform constant expressions because there's an inverse fold		// But don't transform constant expressions because there's an inverse fold
// for X + (-Y) --> X - Y.		// for X + (-Y) --> X - Y.
if (match(Op1, m_Constant(C)) && !isa<ConstantExpr>(Op1))		if (MC.try_match(Op1, m_Constant(C)) && !isa<ConstantExpr>(Op1))
return BinaryOperator::CreateFAddFMF(Op0, ConstantExpr::getFNeg(C), &I);		return MCBuilder.CreateFAddFMF(Op0, ConstantExpr::getFNeg(C), &I);

// X - (-Y) --> X + Y		// X - (-Y) --> X + Y
if (match(Op1, m_FNeg(m_Value(Y))))		if (MC.try_match(Op1, m_FNeg(m_Value(Y))))
return BinaryOperator::CreateFAddFMF(Op0, Y, &I);		return MCBuilder.CreateFAddFMF(Op0, Y, &I);

// Similar to above, but look through a cast of the negated value:		// Similar to above, but look through a cast of the negated value:
// X - (fptrunc(-Y)) --> X + fptrunc(Y)		// X - (fptrunc(-Y)) --> X + fptrunc(Y)
Type *Ty = I.getType();		Type *Ty = I.getType();
if (match(Op1, m_OneUse(m_FPTrunc(m_FNeg(m_Value(Y))))))		if (MC.try_match(Op1, m_OneUse(m_FPTrunc(m_FNeg(m_Value(Y))))))
return BinaryOperator::CreateFAddFMF(Op0, Builder.CreateFPTrunc(Y, Ty), &I);		return MCBuilder.CreateFAddFMF(Op0, MCBuilder.CreateFPTrunc(Builder, Y, Ty), &I);

// X - (fpext(-Y)) --> X + fpext(Y)		// X - (fpext(-Y)) --> X + fpext(Y)
if (match(Op1, m_OneUse(m_FPExt(m_FNeg(m_Value(Y))))))		if (MC.try_match(Op1, m_OneUse(m_FPExt(m_FNeg(m_Value(Y))))))
return BinaryOperator::CreateFAddFMF(Op0, Builder.CreateFPExt(Y, Ty), &I);		return MCBuilder.CreateFAddFMF(Op0, MCBuilder.CreateFPExt(Builder, Y, Ty), &I);

// Similar to above, but look through fmul/fdiv of the negated value:		// Similar to above, but look through fmul/fdiv of the negated value:
// Op0 - (-X * Y) --> Op0 + (X * Y)		// Op0 - (-X * Y) --> Op0 + (X * Y)
// Op0 - (Y * -X) --> Op0 + (X * Y)		// Op0 - (Y * -X) --> Op0 + (X * Y)
if (match(Op1, m_OneUse(m_c_FMul(m_FNeg(m_Value(X)), m_Value(Y))))) {		if (match(Op1, m_OneUse(m_c_FMul(m_FNeg(m_Value(X)), m_Value(Y))))) {
Value *FMul = Builder.CreateFMulFMF(X, Y, &I);		Value *FMul = Builder.CreateFMulFMF(X, Y, &I);
return BinaryOperator::CreateFAddFMF(Op0, FMul, &I);		return BinaryOperator::CreateFAddFMF(Op0, FMul, &I);
}		}
// Op0 - (-X / Y) --> Op0 + (X / Y)		// Op0 - (-X / Y) --> Op0 + (X / Y)
// Op0 - (X / -Y) --> Op0 + (X / Y)		// Op0 - (X / -Y) --> Op0 + (X / Y)
if (match(Op1, m_OneUse(m_FDiv(m_FNeg(m_Value(X)), m_Value(Y)))) \|\|		if (match(Op1, m_OneUse(m_FDiv(m_FNeg(m_Value(X)), m_Value(Y)))) \|\|
match(Op1, m_OneUse(m_FDiv(m_Value(X), m_FNeg(m_Value(Y)))))) {		match(Op1, m_OneUse(m_FDiv(m_Value(X), m_FNeg(m_Value(Y)))))) {
Value *FDiv = Builder.CreateFDivFMF(X, Y, &I);		Value *FDiv = Builder.CreateFDivFMF(X, Y, &I);
return BinaryOperator::CreateFAddFMF(Op0, FDiv, &I);		return BinaryOperator::CreateFAddFMF(Op0, FDiv, &I);
}		}

// Handle special cases for FSub with selects feeding the operation		// Handle special cases for FSub with selects feeding the operation
if (Value *V = SimplifySelectsFeedingBinaryOp(I, Op0, Op1))		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I))
		if (Value V = SimplifySelectsFeedingBinaryOp(PlainBinOp, Op0, Op1))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {		if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {
// (Y - X) - Y --> -X		// (Y - X) - Y --> -X
if (match(Op0, m_FSub(m_Specific(Op1), m_Value(X))))		if (MC.try_match(Op0, m_FSub(m_Specific(Op1), m_Value(X))))
return BinaryOperator::CreateFNegFMF(X, &I);		return MCBuilder.CreateFNegFMF(X, &I);

// Y - (X + Y) --> -X		// Y - (X + Y) --> -X
// Y - (Y + X) --> -X		// Y - (Y + X) --> -X
if (match(Op1, m_c_FAdd(m_Specific(Op0), m_Value(X))))		if (MC.try_match(Op1, m_c_FAdd(m_Specific(Op0), m_Value(X))))
return BinaryOperator::CreateFNegFMF(X, &I);		return MCBuilder.CreateFNegFMF(X, &I);

// (X * C) - X --> X * (C - 1.0)		// (X * C) - X --> X * (C - 1.0)
if (match(Op0, m_FMul(m_Specific(Op1), m_Constant(C)))) {		if (MC.try_match(Op0, m_FMul(m_Specific(Op1), m_Constant(C)))) {
Constant *CSubOne = ConstantExpr::getFSub(C, ConstantFP::get(Ty, 1.0));		Constant *CSubOne = ConstantExpr::getFSub(C, ConstantFP::get(Ty, 1.0));
return BinaryOperator::CreateFMulFMF(Op1, CSubOne, &I);		return MCBuilder.CreateFMulFMF(Op1, CSubOne, &I);
}		}
// X - (X * C) --> X * (1.0 - C)		// X - (X * C) --> X * (1.0 - C)
if (match(Op1, m_FMul(m_Specific(Op0), m_Constant(C)))) {		if (MC.try_match(Op1, m_FMul(m_Specific(Op0), m_Constant(C)))) {
Constant *OneSubC = ConstantExpr::getFSub(ConstantFP::get(Ty, 1.0), C);		Constant *OneSubC = ConstantExpr::getFSub(ConstantFP::get(Ty, 1.0), C);
return BinaryOperator::CreateFMulFMF(Op0, OneSubC, &I);		return MCBuilder.CreateFMulFMF(Op0, OneSubC, &I);
}		}

if (Instruction *F = factorizeFAddFSub(I, Builder))		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I)) {
		if (Instruction F = factorizeFAddFSub(PlainBinOp, Builder))
return F;		return F;

// TODO: This performs reassociative folds for FP ops. Some fraction of the		// TODO: This performs reassociative folds for FP ops. Some fraction of the
// functionality has been subsumed by simple pattern matching here and in		// functionality has been subsumed by simple pattern matching here and in
// InstSimplify. We should let a dedicated reassociation pass handle more		// InstSimplify. We should let a dedicated reassociation pass handle more
// complex pattern matching and remove this from InstCombine.		// complex pattern matching and remove this from InstCombine.
if (Value *V = FAddCombine(Builder).simplify(&I))		if (Value *V = FAddCombine(Builder).simplify(PlainBinOp))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(*PlainBinOp, V);
		}
}		}

return nullptr;		return nullptr;
}		}

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 32 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
▲ Show 20 Lines • Show All 1,739 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitCallInst(CallInst &CI) {

// If the caller function is nounwind, mark the call as nounwind, even if the		// If the caller function is nounwind, mark the call as nounwind, even if the
// callee isn't.		// callee isn't.
if (CI.getFunction()->doesNotThrow() && !CI.doesNotThrow()) {		if (CI.getFunction()->doesNotThrow() && !CI.doesNotThrow()) {
CI.setDoesNotThrow();		CI.setDoesNotThrow();
return &CI;		return &CI;
}		}

		// Predicated instruction patterns
		auto * VPInst = dyn_cast<VPIntrinsic>(&CI);
		if (VPInst) {
		auto * PredInst = cast<PredicatedInstruction>(VPInst);
		auto Result = visitPredicatedInstruction(PredInst);
		if (Result) return Result;
		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(&CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(&CI);
if (!II) return visitCallBase(CI);		if (!II) return visitCallBase(CI);

// Intrinsics cannot occur in an invoke or a callbr, so handle them here		// Intrinsics cannot occur in an invoke or a callbr, so handle them here
// instead of in visitCallBase.		// instead of in visitCallBase.
if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {		if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {
bool Changed = false;		bool Changed = false;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {
} else if (auto *MSI = dyn_cast<AnyMemSetInst>(MI)) {		} else if (auto *MSI = dyn_cast<AnyMemSetInst>(MI)) {
if (Instruction *I = SimplifyAnyMemSet(MSI))		if (Instruction *I = SimplifyAnyMemSet(MSI))
return I;		return I;
}		}

if (Changed) return II;		if (Changed) return II;
}		}

// For vector result intrinsics, use the generic demanded vector support.		// For vector result intrinsics, use the generic demanded vector support to
		// simplify any operands before moving on to the per-intrinsic rules.
if (II->getType()->isVectorTy()) {		if (II->getType()->isVectorTy()) {
auto VWidth = II->getType()->getVectorNumElements();		auto VWidth = II->getType()->getVectorNumElements();
APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {
if (V != II)		if (V != II)
return replaceInstUsesWith(*II, V);		return replaceInstUsesWith(*II, V);
return II;		return II;
▲ Show 20 Lines • Show All 2,971 Lines • Show Last 20 Lines

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

Show All 24 Lines
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
▲ Show 20 Lines • Show All 325 Lines • ▼ Show 20 Lines	public:
// I - Change was made, I is still valid, I may be dead though		// I - Change was made, I is still valid, I may be dead though
// otherwise - Change was made, replace I with returned instruction		// otherwise - Change was made, replace I with returned instruction
//		//
Instruction *visitFNeg(UnaryOperator &I);		Instruction *visitFNeg(UnaryOperator &I);
Instruction *visitAdd(BinaryOperator &I);		Instruction *visitAdd(BinaryOperator &I);
Instruction *visitFAdd(BinaryOperator &I);		Instruction *visitFAdd(BinaryOperator &I);
Value OptimizePointerDifference(Value LHS, Value RHS, Type Ty);		Value OptimizePointerDifference(Value LHS, Value RHS, Type Ty);
Instruction *visitSub(BinaryOperator &I);		Instruction *visitSub(BinaryOperator &I);
		template<typename BinaryOpTy, typename MatcherType> Instruction *visitFSubGeneric(BinaryOpTy &I);
		Instruction *visitPredicatedFSub(PredicatedBinaryOperator &I);
Instruction *visitFSub(BinaryOperator &I);		Instruction *visitFSub(BinaryOperator &I);
Instruction *visitMul(BinaryOperator &I);		Instruction *visitMul(BinaryOperator &I);
Instruction *visitFMul(BinaryOperator &I);		Instruction *visitFMul(BinaryOperator &I);
Instruction *visitURem(BinaryOperator &I);		Instruction *visitURem(BinaryOperator &I);
Instruction *visitSRem(BinaryOperator &I);		Instruction *visitSRem(BinaryOperator &I);
Instruction *visitFRem(BinaryOperator &I);		Instruction *visitFRem(BinaryOperator &I);
bool simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I);		bool simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I);
Instruction *commonRemTransforms(BinaryOperator &I);		Instruction *commonRemTransforms(BinaryOperator &I);
▲ Show 20 Lines • Show All 60 Lines • ▼ Show 20 Lines	public:
Instruction *visitInsertElementInst(InsertElementInst &IE);		Instruction *visitInsertElementInst(InsertElementInst &IE);
Instruction *visitExtractElementInst(ExtractElementInst &EI);		Instruction *visitExtractElementInst(ExtractElementInst &EI);
Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);		Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);
Instruction *visitExtractValueInst(ExtractValueInst &EV);		Instruction *visitExtractValueInst(ExtractValueInst &EV);
Instruction *visitLandingPadInst(LandingPadInst &LI);		Instruction *visitLandingPadInst(LandingPadInst &LI);
Instruction *visitVAStartInst(VAStartInst &I);		Instruction *visitVAStartInst(VAStartInst &I);
Instruction *visitVACopyInst(VACopyInst &I);		Instruction *visitVACopyInst(VACopyInst &I);

		// Entry point to VPIntrinsic
		Instruction visitPredicatedInstruction(PredicatedInstruction PI) {
		switch (PI->getOpcode()) {
		default:
		return nullptr;
		case Instruction::FSub:
		return visitPredicatedFSub(cast<PredicatedBinaryOperator>(*PI));
		}
		}

/// Specify what to return for unhandled instructions.		/// Specify what to return for unhandled instructions.
Instruction *visitInstruction(Instruction &I) { return nullptr; }		Instruction *visitInstruction(Instruction &I) { return nullptr; }

/// True when DB dominates all uses of DI except UI.		/// True when DB dominates all uses of DI except UI.
/// UI must be in the same block as DI.		/// UI must be in the same block as DI.
/// The routine checks that the DI parent and DB are different.		/// The routine checks that the DI parent and DB are different.
bool dominatesAllUses(const Instruction DI, const Instruction UI,		bool dominatesAllUses(const Instruction DI, const Instruction UI,
const BasicBlock *DB) const;		const BasicBlock *DB) const;
▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 854 Lines • ▼ Show 20 Lines	if (Attr.isStringAttribute()) {
case Attribute::Convergent:		case Attribute::Convergent:
case Attribute::Dereferenceable:		case Attribute::Dereferenceable:
case Attribute::DereferenceableOrNull:		case Attribute::DereferenceableOrNull:
case Attribute::InAlloca:		case Attribute::InAlloca:
case Attribute::InReg:		case Attribute::InReg:
case Attribute::InaccessibleMemOnly:		case Attribute::InaccessibleMemOnly:
case Attribute::InaccessibleMemOrArgMemOnly:		case Attribute::InaccessibleMemOrArgMemOnly:
case Attribute::JumpTable:		case Attribute::JumpTable:
		case Attribute::Mask:
case Attribute::Naked:		case Attribute::Naked:
case Attribute::Nest:		case Attribute::Nest:
case Attribute::NoAlias:		case Attribute::NoAlias:
case Attribute::NoBuiltin:		case Attribute::NoBuiltin:
case Attribute::NoCapture:		case Attribute::NoCapture:
case Attribute::NoReturn:		case Attribute::NoReturn:
case Attribute::NoSync:		case Attribute::NoSync:
case Attribute::None:		case Attribute::None:
case Attribute::NonNull:		case Attribute::NonNull:
		case Attribute::Passthru:
case Attribute::ReadNone:		case Attribute::ReadNone:
case Attribute::ReadOnly:		case Attribute::ReadOnly:
case Attribute::Returned:		case Attribute::Returned:
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
case Attribute::SExt:		case Attribute::SExt:
case Attribute::Speculatable:		case Attribute::Speculatable:
case Attribute::StackAlignment:		case Attribute::StackAlignment:
case Attribute::StructRet:		case Attribute::StructRet:
case Attribute::SwiftError:		case Attribute::SwiftError:
case Attribute::SwiftSelf:		case Attribute::SwiftSelf:
case Attribute::WillReturn:		case Attribute::WillReturn:
		case Attribute::VectorLength:
case Attribute::WriteOnly:		case Attribute::WriteOnly:
case Attribute::ZExt:		case Attribute::ZExt:
case Attribute::ImmArg:		case Attribute::ImmArg:
case Attribute::EndAttrKinds:		case Attribute::EndAttrKinds:
continue;		continue;
// Those attributes should be safe to propagate to the extracted function.		// Those attributes should be safe to propagate to the extracted function.
case Attribute::AlwaysInline:		case Attribute::AlwaysInline:
case Attribute::Cold:		case Attribute::Cold:
▲ Show 20 Lines • Show All 730 Lines • Show Last 20 Lines

llvm/test/Bitcode/attributes.ll

	Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines
	}			}

	; CHECK: define void @f63() #39			; CHECK: define void @f63() #39
	define void @f63() sanitize_memtag			define void @f63() sanitize_memtag
	{			{
	ret void;			ret void;
	}			}

				; CHECK: define <8 x double> @f64(<8 x double> passthru %0, <8 x i1> mask %1, i32 vlen %2) {
				define <8 x double> @f64(<8 x double> passthru, <8 x i1> mask, i32 vlen) {
				ret <8 x double> undef
				}

	; CHECK: attributes #0 = { noreturn }			; CHECK: attributes #0 = { noreturn }
	; CHECK: attributes #1 = { nounwind }			; CHECK: attributes #1 = { nounwind }
	; CHECK: attributes #2 = { readnone }			; CHECK: attributes #2 = { readnone }
	; CHECK: attributes #3 = { readonly }			; CHECK: attributes #3 = { readonly }
	; CHECK: attributes #4 = { noinline }			; CHECK: attributes #4 = { noinline }
	; CHECK: attributes #5 = { alwaysinline }			; CHECK: attributes #5 = { alwaysinline }
	; CHECK: attributes #6 = { optsize }			; CHECK: attributes #6 = { optsize }
	; CHECK: attributes #7 = { ssp }			; CHECK: attributes #7 = { ssp }
	Show All 33 Lines

llvm/test/CodeGen/AArch64/O0-pipeline.ll

	Show All 18 Lines
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
				; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: AArch64 Stack Tagging			; CHECK-NEXT: AArch64 Stack Tagging
	; CHECK-NEXT: Rewrite Symbols			; CHECK-NEXT: Rewrite Symbols
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	▲ Show 20 Lines • Show All 42 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/O3-pipeline.ll

	Show First 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
				; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Memory SSA			; CHECK-NEXT: Memory SSA
	; CHECK-NEXT: Interleaved Load Combine Pass			; CHECK-NEXT: Interleaved Load Combine Pass
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

llvm/test/CodeGen/ARM/O3-pipeline.ll

	Show All 25 Lines
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
				; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Scalar Evolution Analysis			; CHECK-NEXT: Scalar Evolution Analysis
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Function Alias Analysis Results			; CHECK-NEXT: Function Alias Analysis Results
	; CHECK-NEXT: Transform functions to use DSP intrinsics			; CHECK-NEXT: Transform functions to use DSP intrinsics
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines

llvm/test/CodeGen/Generic/expand-vp.ll

This file was added.

				; RUN: opt --expand-vec-pred -S < %s \| FileCheck %s

				define void @test_vp_constrainedfp(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, <8 x double> %f3, <8 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.reduce}}
				; CHECK-NOT: {{call.* @llvm.vp.fadd}}
				; CHECK-NOT: {{call.* @llvm.vp.fsub}}
				; CHECK-NOT: {{call.* @llvm.vp.fmul}}
				; CHECK-NOT: {{call.* @llvm.vp.frem}}
				; CHECK-NOT: {{call.* @llvm.vp.fma}}
				; CHECK-NOT: {{call.* @llvm.vp.fneg}}
				%r0 = call <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r1 = call <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r5 = call <8 x double> @llvm.vp.fma.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r6 = call <8 x double> @llvm.vp.fneg.v8f64(<8 x double> %f2, metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				ret void
				}

				define void @test_vp_int(<8 x i32> %i0, <8 x i32> %i1, <8 x i32> %i2, <8 x i32> %f3, <8 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.add}}
				; CHECK-NOT: {{call.* @llvm.vp.sub}}
				; CHECK-NOT: {{call.* @llvm.vp.mul}}
				; CHECK-NOT: {{call.* @llvm.vp.sdiv}}
				; CHECK-NOT: {{call.* @llvm.vp.udiv}}
				; CHECK-NOT: {{call.* @llvm.vp.srem}}
				; CHECK-NOT: {{call.* @llvm.vp.urem}}
				%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				ret void
				}

				define void @test_mem(<16 x i32> %p0, <16 x i32> %p1, <16 x i32> %i0, <16 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.load}}
				; CHECK-NOT: {{call.* @llvm.vp.store}}
				; CHECK-NOT: {{call.* @llvm.vp.gather}}
				; CHECK-NOT: {{call.* @llvm.vp.scatter}}
				call void @llvm.vp.store.v16i32.p0v16i32(<16 x i32> %i0, <16 x i32>* %p1, <16 x i1> %m, i32 %n)
				call void @llvm.vp.scatter.v16i32.v16p0i32(<16 x i32> %i0 , <16 x i32*> %p0, <16 x i1> %m, i32 %n)
				%l0 = call <16 x i32> @llvm.vp.load.v16i32.p0v16i32(<16 x i32>* %p1, <16 x i1> %m, i32 %n)
				%l1 = call <16 x i32> @llvm.vp.gather.v16i32.v16p0i32(<16 x i32*> %p0, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_reduce_fp(<16 x float> %v, <16 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.reduce.fadd}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.fmul}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.fmin}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.fmax}}
				%r0 = call float @llvm.vp.reduce.fadd.v16f32(float 0.0, <16 x float> %v, <16 x i1> %m, i32 %n)
				%r1 = call float @llvm.vp.reduce.fmul.v16f32(float 42.0, <16 x float> %v, <16 x i1> %m, i32 %n)
				%r2 = call float @llvm.vp.reduce.fmin.v16f32(<16 x float> %v, <16 x i1> %m, i32 %n)
				%r3 = call float @llvm.vp.reduce.fmax.v16f32(<16 x float> %v, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_reduce_int(<16 x i32> %v, <16 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.reduce.add}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.mul}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.and}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.xor}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.or}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.smin}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.smax}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.umin}}
				; CHECK-NOT: {{call.* @llvm.vp.reduce.umax}}
				%r0 = call i32 @llvm.vp.reduce.add.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r1 = call i32 @llvm.vp.reduce.mul.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r2 = call i32 @llvm.vp.reduce.and.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r3 = call i32 @llvm.vp.reduce.xor.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r4 = call i32 @llvm.vp.reduce.or.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r5 = call i32 @llvm.vp.reduce.smin.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r6 = call i32 @llvm.vp.reduce.smax.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r7 = call i32 @llvm.vp.reduce.umin.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r8 = call i32 @llvm.vp.reduce.umax.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_shuffle(<16 x float> %v0, <16 x float> %v1, <16 x i1> %m, i32 %k, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.select}}
				; CHECK-NOT: {{call.* @llvm.vp.compose}}
				; no generic lowering available: {{call.* @llvm.vp.compress}}
				; no generic lowering available: {{call.* @llvm.vp.expand}}
				; no generic lowering available: {{call.* @llvm.vp.vshift}}
				%r0 = call <16 x float> @llvm.vp.select.v16f32(<16 x i1> %m, <16 x float> %v0, <16 x float> %v1, i32 %n)
				%r1 = call <16 x float> @llvm.vp.compose.v16f32(<16 x float> %v0, <16 x float> %v1, i32 %k, i32 %n)
				%r2 = call <16 x float> @llvm.vp.vshift.v16f32(<16 x float> %v0, i32 %k, <16 x i1> %m, i32 %n)
				%r3 = call <16 x float> @llvm.vp.compress.v16f32(<16 x float> %v0, <16 x i1> %m, i32 %n)
				%r4 = call <16 x float> @llvm.vp.expand.v16f32(<16 x float> %v0, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_xcmp(<16 x i32> %i0, <16 x i32> %i1, <16 x float> %f0, <16 x float> %f1, <16 x i1> %m, i32 %n) {
				; CHECK-NOT: {{call.* @llvm.vp.icmp}}
				; CHECK-NOT: {{call.* @llvm.vp.fcmp}}
				%r0 = call <16 x i1> @llvm.vp.icmp.v16i32(<16 x i32> %i0, <16 x i32> %i1, i8 38, <16 x i1> %m, i32 %n)
				%r1 = call <16 x i1> @llvm.vp.fcmp.v16f32(<16 x float> %f0, <16 x float> %f1, i8 10, <16 x i1> %m, i32 %n)
				ret void
				}

				; standard floating point arith
				declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fma.v8f64(<8 x double>, <8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fneg.v8f64(<8 x double>, metadata, <8 x i1> mask, i32 vlen)

				; integer arith
				declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				; bit arith
				declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)

				; memory
				declare void @llvm.vp.store.v16i32.p0v16i32(<16 x i32>, <16 x i32>*, <16 x i1> mask, i32 vlen)
				declare void @llvm.vp.scatter.v16i32.v16p0i32(<16 x i32>, <16 x i32*>, <16 x i1> mask, i32 vlen)
				declare <16 x i32> @llvm.vp.load.v16i32.p0v16i32(<16 x i32>*, <16 x i1> mask, i32 vlen)
				declare <16 x i32> @llvm.vp.gather.v16i32.v16p0i32(<16 x i32*>, <16 x i1> mask, i32 vlen)

				; reductions
				declare float @llvm.vp.reduce.fadd.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmul.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmin.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmax.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.add.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.mul.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.and.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.xor.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.or.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.smin.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.smax.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.umin.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.umax.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)

				; shuffles
				declare <16 x float> @llvm.vp.select.v16f32(<16 x i1>, <16 x float>, <16 x float>, i32 vlen)
				declare <16 x float> @llvm.vp.compose.v16f32(<16 x float>, <16 x float>, i32, i32 vlen)
				declare <16 x float> @llvm.vp.vshift.v16f32(<16 x float>, i32, <16 x i1>, i32 vlen)
				declare <16 x float> @llvm.vp.compress.v16f32(<16 x float>, <16 x i1>, i32 vlen)
				declare <16 x float> @llvm.vp.expand.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)

				; icmp , fcmp
				declare <16 x i1> @llvm.vp.icmp.v16i32(<16 x i32>, <16 x i32>, i8, <16 x i1> mask, i32 vlen)
				declare <16 x i1> @llvm.vp.fcmp.v16f32(<16 x float>, <16 x float>, i8, <16 x i1> mask, i32 vlen)

llvm/test/CodeGen/X86/O0-pipeline.ll

	Show All 21 Lines
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)			; CHECK-NEXT: Basic Alias Analysis (stateless AA impl)
	; CHECK-NEXT: Module Verifier			; CHECK-NEXT: Module Verifier
	; CHECK-NEXT: Lower Garbage Collection Instructions			; CHECK-NEXT: Lower Garbage Collection Instructions
	; CHECK-NEXT: Shadow Stack GC Lowering			; CHECK-NEXT: Shadow Stack GC Lowering
	; CHECK-NEXT: Lower constant intrinsics			; CHECK-NEXT: Lower constant intrinsics
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
				; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Expand indirectbr instructions			; CHECK-NEXT: Expand indirectbr instructions
	; CHECK-NEXT: Rewrite Symbols			; CHECK-NEXT: Rewrite Symbols
	; CHECK-NEXT: FunctionPass Manager			; CHECK-NEXT: FunctionPass Manager
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Exception handling preparation			; CHECK-NEXT: Exception handling preparation
	; CHECK-NEXT: Safe Stack instrumentation pass			; CHECK-NEXT: Safe Stack instrumentation pass
	▲ Show 20 Lines • Show All 41 Lines • Show Last 20 Lines

llvm/test/CodeGen/X86/O3-pipeline.ll

	Show All 38 Lines
	; CHECK-NEXT: Remove unreachable blocks from the CFG			; CHECK-NEXT: Remove unreachable blocks from the CFG
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: Branch Probability Analysis			; CHECK-NEXT: Branch Probability Analysis
	; CHECK-NEXT: Block Frequency Analysis			; CHECK-NEXT: Block Frequency Analysis
	; CHECK-NEXT: Constant Hoisting			; CHECK-NEXT: Constant Hoisting
	; CHECK-NEXT: Partially inline calls to library functions			; CHECK-NEXT: Partially inline calls to library functions
	; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)			; CHECK-NEXT: Instrument function entry/exit with calls to e.g. mcount() (post inlining)
				; CHECK-NEXT: Expand vector predication intrinsics
	; CHECK-NEXT: Scalarize Masked Memory Intrinsics			; CHECK-NEXT: Scalarize Masked Memory Intrinsics
	; CHECK-NEXT: Expand reduction intrinsics			; CHECK-NEXT: Expand reduction intrinsics
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Interleaved Access Pass			; CHECK-NEXT: Interleaved Access Pass
	; CHECK-NEXT: Expand indirectbr instructions			; CHECK-NEXT: Expand indirectbr instructions
	; CHECK-NEXT: Dominator Tree Construction			; CHECK-NEXT: Dominator Tree Construction
	; CHECK-NEXT: Natural Loop Information			; CHECK-NEXT: Natural Loop Information
	; CHECK-NEXT: CodeGen Prepare			; CHECK-NEXT: CodeGen Prepare
	▲ Show 20 Lines • Show All 127 Lines • Show Last 20 Lines

llvm/test/Transforms/InstCombine/vp-fsub.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				; PR4374

				define <4 x float> @test1_vp(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) {
				; CHECK-LABEL: @test1_vp(
				;
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				%t2 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, <4 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				ret <4 x float> %t2
				}

				; Can't do anything with the test above because -0.0 - 0.0 = -0.0, but if we have nsz:
				; -(X - Y) --> Y - X

				; TODO predicated FAdd folding
				define <4 x float> @neg_sub_nsz_vp(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) {
				; CH***-LABEL: @neg_sub_nsz_vp(
				;
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				%t2 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, <4 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				ret <4 x float> %t2
				}

				; With nsz: Z - (X - Y) --> Z + (Y - X)

				define <4 x float> @sub_sub_nsz_vp(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x i1> %M, i32 %L) {
				; CHECK-LABEL: @sub_sub_nsz_vp(
				; CHECK-NEXT: %1 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %y, <4 x float> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #
				; CHECK-NEXT: %t2 = call nsz <4 x float> @llvm.vp.fadd.v4f32(<4 x float> %z, <4 x float> %1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #
				; CHECK-NEXT: ret <4 x float> %t2
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				%t2 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %z, <4 x float> %t1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <4 x i1> %M, i32 %L) #0
				ret <4 x float> %t2
				}



				; Function Attrs: nounwind readnone
				declare <4 x float> @llvm.vp.fadd.v4f32(<4 x float>, <4 x float>, metadata, metadata, <4 x i1> mask, i32 vlen)

				; Function Attrs: nounwind readnone
				declare <4 x float> @llvm.vp.fsub.v4f32(<4 x float>, <4 x float>, metadata, metadata, <4 x i1> mask, i32 vlen)

				attributes #0 = { readnone }

llvm/test/Transforms/InstSimplify/vp-fsub.ll

This file was added.

				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				define <8 x double> @fsub_fadd_fold_vp_xy(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len) {
				; CHECK-LABEL: fsub_fadd_fold_vp_xy
				; CHECK: ret <8 x double> %x
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %x, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				%res0 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				ret <8 x double> %res0
				}

				define <8 x double> @fsub_fadd_fold_vp_zw(<8 x double> %z, <8 x double> %w, <8 x i1> %m, i32 %len) {
				; CHECK-LABEL: fsub_fadd_fold_vp_zw
				; CHECK: ret <8 x double> %z
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %w, <8 x double> %z, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				%res1 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %w, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				ret <8 x double> %res1
				}

				; REQUIRES-CONSTRAINED-VP: define <8 x double> @fsub_fadd_fold_vp_yx_fpexcept(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len) #0 {
				; REQUIRES-CONSTRAINED-VP: ; *HECK-LABEL: fsub_fadd_fold_vp_yx
				; REQUIRES-CONSTRAINED-VP: ; *HECK-NEXT: %tmp =
				; REQUIRES-CONSTRAINED-VP: ; *HECK-NEXT: %res2 =
				; REQUIRES-CONSTRAINED-VP: ; *HECK-NEXT: ret
				; REQUIRES-CONSTRAINED-VP: %tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.strict", <8 x i1> %m, i32 %len)
				; REQUIRES-CONSTRAINED-VP: %res2 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.strict", <8 x i1> %m, i32 %len)
				; REQUIRES-CONSTRAINED-VP: ret <8 x double> %res2
				; REQUIRES-CONSTRAINED-VP: }

				define <8 x double> @fsub_fadd_fold_vp_yx_olen(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len, i32 %otherLen) {
				; CHECK-LABEL: fsub_fadd_fold_vp_yx_olen
				; CHECK-NEXT: %tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %otherLen)
				; CHECK-NEXT: %res3 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				; CHECK-NEXT: ret <8 x double> %res3
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %otherLen)
				%res3 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				ret <8 x double> %res3
				}

				define <8 x double> @fsub_fadd_fold_vp_yx_omask(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len, <8 x i1> %othermask) {
				; CHECK-LABEL: fsub_fadd_fold_vp_yx_omask
				; CHECK-NEXT: %tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				; CHECK-NEXT: %res4 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %othermask, i32 %len)
				; CHECK-NEXT: ret <8 x double> %res4
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %len)
				%res4 = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %othermask, i32 %len)
				ret <8 x double> %res4
				}

				; Function Attrs: nounwind readnone
				declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)

				; Function Attrs: nounwind readnone
				declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)

				attributes #0 = { strictfp }

llvm/test/Verifier/evl_attribs.ll

This file was added.

				; RUN: not llvm-as %s -o /dev/null 2>&1 \| FileCheck %s

				declare void @a(<16 x i1> mask %a, <16 x i1> mask %b)
				; CHECK: Cannot have multiple 'mask' parameters!

				declare void @b(<16 x i1> mask %a, i32 vlen %x, i32 vlen %y)
				; CHECK: Cannot have multiple 'vlen' parameters!

				declare <16 x double> @c(<16 x double> passthru %a)
				; CHECK: Cannot have 'passthru' parameter without 'mask' parameter!

				declare <16 x double> @d(<16 x double> passthru %a, <16 x i1> mask %M, <16 x double> passthru %b)
				; CHECK: Cannot have multiple 'passthru' parameters!

llvm/test/Verifier/vp-intrinsics-constrained.ll

This file was added.

				; RUN: not opt -S < %s \|& FileCheck %s
				; CHECK: VP intrinsics only support the default fp environment for now (round.tonearest; fpexcept.ignore).
				; CHECK: error: input module is broken!

				define void @test_vp_strictfp(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, <8 x double> %f3, <8 x i1> %m, i32 %n) #0 {
				%r0 = call <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.strict", <8 x i1> %m, i32 %n)
				ret void
				}

				define void @test_vp_rounding(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, <8 x double> %f3, <8 x i1> %m, i32 %n) #0 {
				%r0 = call <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tozero", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				ret void
				}

				declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)

				attributes #0 = { strictfp }

llvm/test/Verifier/vp-intrinsics.ll

This file was added.

				; RUN: opt --verify %s

				define void @test_vp_constrainedfp(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, <8 x double> %f3, <8 x i1> %m, i32 %n) {
				%r0 = call <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r1 = call <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r2 = call <8 x double> @llvm.vp.fmul.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r3 = call <8 x double> @llvm.vp.fdiv.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r4 = call <8 x double> @llvm.vp.frem.v8f64(<8 x double> %f0, <8 x double> %f1, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r5 = call <8 x double> @llvm.vp.fma.v8f64(<8 x double> %f0, <8 x double> %f1, <8 x double> %f2, metadata !"round.tonearest", metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				%r6 = call <8 x double> @llvm.vp.fneg.v8f64(<8 x double> %f2, metadata !"fpexcept.ignore", <8 x i1> %m, i32 %n)
				ret void
				}

				define void @test_vp_int(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) {
				%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r7 = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r8 = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r9 = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rA = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rB = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rC = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				ret void
				}

				define void @test_mem(<16 x i32> %p0, <16 x i32> %p1, <16 x i32> %i0, <16 x i1> %m, i32 %n) {
				call void @llvm.vp.store.v16i32.p0v16i32(<16 x i32> %i0, <16 x i32>* %p1, <16 x i1> %m, i32 %n)
				call void @llvm.vp.scatter.v16i32.v16p0i32(<16 x i32> %i0 , <16 x i32*> %p0, <16 x i1> %m, i32 %n)
				%l0 = call <16 x i32> @llvm.vp.load.v16i32.p0v16i32(<16 x i32>* %p1, <16 x i1> %m, i32 %n)
				%l1 = call <16 x i32> @llvm.vp.gather.v16i32.v16p0i32(<16 x i32*> %p0, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_reduce_fp(<16 x float> %v, <16 x i1> %m, i32 %n) {
				%r0 = call float @llvm.vp.reduce.fadd.v16f32(float 0.0, <16 x float> %v, <16 x i1> %m, i32 %n)
				%r1 = call float @llvm.vp.reduce.fmul.v16f32(float 42.0, <16 x float> %v, <16 x i1> %m, i32 %n)
				%r2 = call float @llvm.vp.reduce.fmin.v16f32(<16 x float> %v, <16 x i1> %m, i32 %n)
				%r3 = call float @llvm.vp.reduce.fmax.v16f32(<16 x float> %v, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_reduce_int(<16 x i32> %v, <16 x i1> %m, i32 %n) {
				%r0 = call i32 @llvm.vp.reduce.add.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r1 = call i32 @llvm.vp.reduce.mul.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r2 = call i32 @llvm.vp.reduce.and.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r3 = call i32 @llvm.vp.reduce.xor.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				%r4 = call i32 @llvm.vp.reduce.or.v16i32(<16 x i32> %v, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_shuffle(<16 x float> %v0, <16 x float> %v1, <16 x i1> %m, i32 %k, i32 %n) {
				%r0 = call <16 x float> @llvm.vp.select.v16f32(<16 x i1> %m, <16 x float> %v0, <16 x float> %v1, i32 %n)
				%r1 = call <16 x float> @llvm.vp.compose.v16f32(<16 x float> %v0, <16 x float> %v1, i32 %k, i32 %n)
				%r2 = call <16 x float> @llvm.vp.shift.v16f32(<16 x float> %v0, i32 %k, <16 x i1> %m, i32 %n)
				%r3 = call <16 x float> @llvm.vp.compress.v16f32(<16 x float> %v0, <16 x i1> %m, i32 %n)
				%r4 = call <16 x float> @llvm.vp.expand.v16f32(<16 x float> %v0, <16 x i1> %m, i32 %n)
				ret void
				}

				define void @test_xcmp(<16 x i32> %v0, <16 x i32> %v1, <16 x i1> %m, i32 %n) {
				%r0 = call <16 x i1> @llvm.vp.icmp.v16i32(<16 x i32> %v0, <16 x i32> %v1, i8 8, <16 x i1> %m, i32 %n)
				%r1 = call <16 x i1> @llvm.vp.fcmp.v16i32(<16 x i32> %v0, <16 x i32> %v1, i8 12, <16 x i1> %m, i32 %n)
				ret void
				}

				; standard floating point arith
				declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fma.v8f64(<8 x double>, <8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen)
				declare <8 x double> @llvm.vp.fneg.v8f64(<8 x double>, metadata, <8 x i1> mask, i32 vlen)

				; integer arith
				declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				; bit arith
				declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)
				declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen)

				; memory
				declare void @llvm.vp.store.v16i32.p0v16i32(<16 x i32>, <16 x i32>*, <16 x i1> mask, i32 vlen)
				declare void @llvm.vp.scatter.v16i32.v16p0i32(<16 x i32>, <16 x i32*>, <16 x i1> mask, i32 vlen)
				declare <16 x i32> @llvm.vp.load.v16i32.p0v16i32(<16 x i32>*, <16 x i1> mask, i32 vlen)
				declare <16 x i32> @llvm.vp.gather.v16i32.v16p0i32(<16 x i32*>, <16 x i1> mask, i32 vlen)

				; reductions
				declare float @llvm.vp.reduce.fadd.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmul.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmin.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)
				declare float @llvm.vp.reduce.fmax.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.add.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.mul.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.and.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.xor.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)
				declare i32 @llvm.vp.reduce.or.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen)

				; shuffles
				declare <16 x float> @llvm.vp.select.v16f32(<16 x i1>, <16 x float>, <16 x float>, i32 vlen)
				declare <16 x float> @llvm.vp.compose.v16f32(<16 x float>, <16 x float>, i32, i32 vlen)
				declare <16 x float> @llvm.vp.shift.v16f32(<16 x float>, i32, <16 x i1>, i32 vlen)
				declare <16 x float> @llvm.vp.compress.v16f32(<16 x float>, <16 x i1>, i32 vlen)
				declare <16 x float> @llvm.vp.expand.v16f32(<16 x float>, <16 x i1> mask, i32 vlen)

				; icmp , fcmp
				declare <16 x i1> @llvm.vp.icmp.v16i32(<16 x i32>, <16 x i32>, i8, <16 x i1> mask, i32 vlen)
				declare <16 x i1> @llvm.vp.fcmp.v16i32(<16 x i32>, <16 x i32>, i8, <16 x i1> mask, i32 vlen)

llvm/tools/llc/llc.cpp

Show First 20 Lines • Show All 308 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeEntryExitInstrumenterPass(*Registry);		initializeEntryExitInstrumenterPass(*Registry);
initializePostInlineEntryExitInstrumenterPass(*Registry);		initializePostInlineEntryExitInstrumenterPass(*Registry);
initializeUnreachableBlockElimLegacyPassPass(*Registry);		initializeUnreachableBlockElimLegacyPassPass(*Registry);
initializeConstantHoistingLegacyPassPass(*Registry);		initializeConstantHoistingLegacyPassPass(*Registry);
initializeScalarOpts(*Registry);		initializeScalarOpts(*Registry);
initializeVectorization(*Registry);		initializeVectorization(*Registry);
initializeScalarizeMaskedMemIntrinPass(*Registry);		initializeScalarizeMaskedMemIntrinPass(*Registry);
initializeExpandReductionsPass(*Registry);		initializeExpandReductionsPass(*Registry);
		initializeExpandVectorPredicationPass(*Registry);
initializeHardwareLoopsPass(*Registry);		initializeHardwareLoopsPass(*Registry);

// Initialize debugging passes.		// Initialize debugging passes.
initializeScavengerTestPass(*Registry);		initializeScavengerTestPass(*Registry);

// Register the target printer for --version.		// Register the target printer for --version.
cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);		cl::AddExtraVersionPrinter(TargetRegistry::printRegisteredTargetsForVersion);

▲ Show 20 Lines • Show All 312 Lines • Show Last 20 Lines

llvm/tools/opt/opt.cpp

Show First 20 Lines • Show All 526 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeGlobalMergePass(Registry);		initializeGlobalMergePass(Registry);
initializeIndirectBrExpandPassPass(Registry);		initializeIndirectBrExpandPassPass(Registry);
initializeInterleavedLoadCombinePass(Registry);		initializeInterleavedLoadCombinePass(Registry);
initializeInterleavedAccessPass(Registry);		initializeInterleavedAccessPass(Registry);
initializeEntryExitInstrumenterPass(Registry);		initializeEntryExitInstrumenterPass(Registry);
initializePostInlineEntryExitInstrumenterPass(Registry);		initializePostInlineEntryExitInstrumenterPass(Registry);
initializeUnreachableBlockElimLegacyPassPass(Registry);		initializeUnreachableBlockElimLegacyPassPass(Registry);
initializeExpandReductionsPass(Registry);		initializeExpandReductionsPass(Registry);
		initializeExpandVectorPredicationPass(Registry);
initializeWasmEHPreparePass(Registry);		initializeWasmEHPreparePass(Registry);
initializeWriteBitcodePassPass(Registry);		initializeWriteBitcodePassPass(Registry);
initializeHardwareLoopsPass(Registry);		initializeHardwareLoopsPass(Registry);

#ifdef LINK_POLLY_INTO_TOOLS		#ifdef LINK_POLLY_INTO_TOOLS
polly::initializePollyPasses(Registry);		polly::initializePollyPasses(Registry);
#endif		#endif

▲ Show 20 Lines • Show All 393 Lines • Show Last 20 Lines

llvm/unittests/IR/CMakeLists.txt

Show All 33 Lines	add_llvm_unittest(IRTests
TypesTest.cpp		TypesTest.cpp
UseTest.cpp		UseTest.cpp
UserTest.cpp		UserTest.cpp
ValueHandleTest.cpp		ValueHandleTest.cpp
ValueMapTest.cpp		ValueMapTest.cpp
ValueTest.cpp		ValueTest.cpp
VectorTypesTest.cpp		VectorTypesTest.cpp
VerifierTest.cpp		VerifierTest.cpp
		VPIntrinsicTest.cpp
WaymarkTest.cpp		WaymarkTest.cpp
)		)

target_link_libraries(IRTests PRIVATE LLVMTestingSupport)		target_link_libraries(IRTests PRIVATE LLVMTestingSupport)

llvm/unittests/IR/IRBuilderTest.cpp

Show First 20 Lines • Show All 236 Lines • ▼ Show 20 Lines	TEST_F(IRBuilderTest, ConstrainedFP) {
AttributeList Attrs = BB->getParent()->getAttributes();		AttributeList Attrs = BB->getParent()->getAttributes();
AttributeSet FnAttrs = Attrs.getFnAttributes();		AttributeSet FnAttrs = Attrs.getFnAttributes();
EXPECT_EQ(FnAttrs.hasAttribute(Attribute::StrictFP), true);		EXPECT_EQ(FnAttrs.hasAttribute(Attribute::StrictFP), true);

// Verify the codepaths for setting and overriding the default metadata.		// Verify the codepaths for setting and overriding the default metadata.
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
ASSERT_TRUE(isa<ConstrainedFPIntrinsic>(V));		ASSERT_TRUE(isa<ConstrainedFPIntrinsic>(V));
auto *CII = cast<ConstrainedFPIntrinsic>(V);		auto *CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebStrict);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebStrict);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDynamic);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmDynamic);

Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebIgnore);		Builder.setDefaultConstrainedExcept(ExceptionBehavior::ebIgnore);
Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmUpward);		Builder.setDefaultConstrainedRounding(RoundingMode::rmUpward);
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
CII = cast<ConstrainedFPIntrinsic>(V);		CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebIgnore);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebIgnore);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmUpward);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmUpward);

Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebIgnore);		Builder.setDefaultConstrainedExcept(ExceptionBehavior::ebIgnore);
Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmToNearest);		Builder.setDefaultConstrainedRounding(RoundingMode::rmToNearest);
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
CII = cast<ConstrainedFPIntrinsic>(V);		CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebIgnore);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebIgnore);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmToNearest);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmToNearest);

Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebMayTrap);		Builder.setDefaultConstrainedExcept(ExceptionBehavior::ebMayTrap);
Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmDownward);		Builder.setDefaultConstrainedRounding(RoundingMode::rmDownward);
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
CII = cast<ConstrainedFPIntrinsic>(V);		CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebMayTrap);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebMayTrap);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDownward);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmDownward);

Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebStrict);		Builder.setDefaultConstrainedExcept(ExceptionBehavior::ebStrict);
Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmTowardZero);		Builder.setDefaultConstrainedRounding(RoundingMode::rmTowardZero);
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
CII = cast<ConstrainedFPIntrinsic>(V);		CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebStrict);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebStrict);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmTowardZero);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmTowardZero);

Builder.setDefaultConstrainedExcept(ConstrainedFPIntrinsic::ebIgnore);		Builder.setDefaultConstrainedExcept(ExceptionBehavior::ebIgnore);
Builder.setDefaultConstrainedRounding(ConstrainedFPIntrinsic::rmDynamic);		Builder.setDefaultConstrainedRounding(RoundingMode::rmDynamic);
V = Builder.CreateFAdd(V, V);		V = Builder.CreateFAdd(V, V);
CII = cast<ConstrainedFPIntrinsic>(V);		CII = cast<ConstrainedFPIntrinsic>(V);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebIgnore);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebIgnore);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDynamic);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmDynamic);

// Now override the defaults.		// Now override the defaults.
Call = Builder.CreateConstrainedFPBinOp(		Call = Builder.CreateConstrainedFPBinOp(
Intrinsic::experimental_constrained_fadd, V, V, nullptr, "", nullptr,		Intrinsic::experimental_constrained_fadd, V, V, nullptr, "", nullptr,
ConstrainedFPIntrinsic::rmDownward, ConstrainedFPIntrinsic::ebMayTrap);		RoundingMode::rmDownward, ExceptionBehavior::ebMayTrap);
CII = cast<ConstrainedFPIntrinsic>(Call);		CII = cast<ConstrainedFPIntrinsic>(Call);
EXPECT_EQ(CII->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);		EXPECT_EQ(CII->getIntrinsicID(), Intrinsic::experimental_constrained_fadd);
ASSERT_TRUE(CII->getExceptionBehavior() == ConstrainedFPIntrinsic::ebMayTrap);		ASSERT_TRUE(CII->getExceptionBehavior() == ExceptionBehavior::ebMayTrap);
ASSERT_TRUE(CII->getRoundingMode() == ConstrainedFPIntrinsic::rmDownward);		ASSERT_TRUE(CII->getRoundingMode() == RoundingMode::rmDownward);

Builder.CreateRetVoid();		Builder.CreateRetVoid();
EXPECT_FALSE(verifyModule(*M));		EXPECT_FALSE(verifyModule(*M));
}		}

TEST_F(IRBuilderTest, Lifetime) {		TEST_F(IRBuilderTest, Lifetime) {
IRBuilder<> Builder(BB);		IRBuilder<> Builder(BB);
AllocaInst *Var1 = Builder.CreateAlloca(Builder.getInt8Ty());		AllocaInst *Var1 = Builder.CreateAlloca(Builder.getInt8Ty());
▲ Show 20 Lines • Show All 620 Lines • Show Last 20 Lines

llvm/unittests/IR/VPIntrinsicTest.cpp

This file was added.

				//===- VPIntrinsicTest.cpp - VPIntrinsic unit tests ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/AsmParser/Parser.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/Support/SourceMgr.h"
				#include "gtest/gtest.h"

				namespace llvm {
				namespace {

				class VPIntrinsicTest : public testing::Test {
				protected:
				LLVMContext Context;

				VPIntrinsicTest() : Context() {}

				LLVMContext C;
				SMDiagnostic Err;

				std::unique_ptr<Module> CreateVPDeclarationModule() {
				return parseAssemblyString(
				" declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.fmul.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.fdiv.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.frem.v8f64(<8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.fma.v8f64(<8 x double>, <8 x double>, <8 x double>, metadata, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x double> @llvm.vp.fneg.v8f64(<8 x double>, metadata, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1> mask, i32 vlen) "
				" declare void @llvm.vp.store.v16i32.p0v16i32(<16 x i32>, <16 x i32>*, <16 x i1> mask, i32 vlen) "
				" declare void @llvm.vp.scatter.v16i32.v16p0i32(<16 x i32>, <16 x i32*>, <16 x i1> mask, i32 vlen) "
				" declare <16 x i32> @llvm.vp.load.v16i32.p0v16i32(<16 x i32>*, <16 x i1> mask, i32 vlen) "
				" declare <16 x i32> @llvm.vp.gather.v16i32.v16p0i32(<16 x i32*>, <16 x i1> mask, i32 vlen) "
				" declare float @llvm.vp.reduce.fadd.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen) "
				" declare float @llvm.vp.reduce.fmul.v16f32(float, <16 x float>, <16 x i1> mask, i32 vlen) "
				" declare float @llvm.vp.reduce.fmin.v16f32(<16 x float>, <16 x i1> mask, i32 vlen) "
				" declare float @llvm.vp.reduce.fmax.v16f32(<16 x float>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.add.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.mul.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.and.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.xor.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.or.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.smin.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.smax.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.umin.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare i32 @llvm.vp.reduce.umax.v16i32(<16 x i32>, <16 x i1> mask, i32 vlen) "
				" declare <16 x float> @llvm.vp.select.v16f32(<16 x i1>, <16 x float>, <16 x float>, i32 vlen) "
				" declare <16 x float> @llvm.vp.compose.v16f32(<16 x float>, <16 x float>, i32, i32 vlen) "
				" declare <16 x float> @llvm.vp.vshift.v16f32(<16 x float>, i32, <16 x i1>, i32 vlen) "
				" declare <16 x float> @llvm.vp.compress.v16f32(<16 x float>, <16 x i1>, i32 vlen) "
				" declare <16 x float> @llvm.vp.expand.v16f32(<16 x float>, <16 x i1> mask, i32 vlen) "
				" declare <16 x i1> @llvm.vp.icmp.v16i32(<16 x i32>, <16 x i32>, i8, <16 x i1> mask, i32 vlen) "
				" declare <16 x i1> @llvm.vp.fcmp.v16f32(<16 x float>, <16 x float>, i8, <16 x i1> mask, i32 vlen) ",
				Err, C);
				}
				};

				/// Check that VPIntrinsic:canIgnoreVectorLengthParam() returns true
				/// if the vector length parameter does not mask-off any lanes.
				TEST_F(VPIntrinsicTest, CanIgnoreVectorLength) {
				LLVMContext C;
				SMDiagnostic Err;

				std::unique_ptr<Module> M =
				parseAssemblyString(
				"declare <256 x i64> @llvm.vp.mul.v256i64(<256 x i64>, <256 x i64>, <256 x i1>, i32)"
				"define void @test_static_vlen(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 %vl) { "
				" %r0 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 %vl)"
				" %r1 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 256)"
				" %r2 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 0)"
				" %r3 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 -1)"
				" %r4 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 123)"
				" ret void "
				"}",
				Err, C);

				auto *F = M->getFunction("test_static_vlen");
				assert(F);

				const int NumExpected = 5;
				const bool Expected[] = {false, true, false, true, false};
				int i = 0;
				for (auto &I : F->getEntryBlock()) {
				VPIntrinsic *VPI = dyn_cast<VPIntrinsic>(&I);
				if (!VPI) {
				ASSERT_TRUE(I.isTerminator());
				continue;
				}

				ASSERT_LT(i, NumExpected);
				ASSERT_EQ(Expected[i], VPI->canIgnoreVectorLengthParam());
				++i;
				}
				}

				/// Check that the argument returned by
				/// VPIntrinsic::Get<X>ParamPos(Intrinsic::ID) has the expected type.
				TEST_F(VPIntrinsicTest, GetParamPos) {
				std::unique_ptr<Module> M = CreateVPDeclarationModule();
				assert(M);

				for (Function &F : *M) {
				ASSERT_TRUE(F.isIntrinsic());
				Optional<int> MaskParamPos = VPIntrinsic::GetMaskParamPos(F.getIntrinsicID());
				if (MaskParamPos.hasValue()) {
				Type *MaskParamType = F.getArg(MaskParamPos.getValue())->getType();
				ASSERT_TRUE(MaskParamType->isVectorTy());
				ASSERT_TRUE(MaskParamType->getVectorElementType()->isIntegerTy(1));
				}

				Optional<int> VecLenParamPos = VPIntrinsic::GetVectorLengthParamPos(F.getIntrinsicID());
				if (VecLenParamPos.hasValue()) {
				Type *VecLenParamType = F.getArg(VecLenParamPos.getValue())->getType();
				ASSERT_TRUE(VecLenParamType->isIntegerTy(32));
				}

				Optional<int> MemPtrParamPos = VPIntrinsic::GetMemoryPointerParamPos(F.getIntrinsicID());
				if (MemPtrParamPos.hasValue()) {
				Type *MemPtrParamType = F.getArg(MemPtrParamPos.getValue())->getType();
				ASSERT_TRUE(MemPtrParamType->isPtrOrPtrVectorTy());
				}

				Optional<int> RoundingParamPos = VPIntrinsic::GetRoundingModeParamPos(F.getIntrinsicID());
				if (RoundingParamPos.hasValue()) {
				Type *RoundingParamType = F.getArg(RoundingParamPos.getValue())->getType();
				ASSERT_TRUE(RoundingParamType->isMetadataTy());
				}

				Optional<int> ExceptParamPos = VPIntrinsic::GetExceptionBehaviorParamPos(F.getIntrinsicID());
				if (ExceptParamPos.hasValue()) {
				Type *ExceptParamType = F.getArg(ExceptParamPos.getValue())->getType();
				ASSERT_TRUE(ExceptParamType->isMetadataTy());
				}
				}
				}

				/// Check that going from Opcode to VP intrinsic and back results in the same Opcode.
				TEST_F(VPIntrinsicTest, OpcodeRoundTrip) {
				std::vector<unsigned> Opcodes;
				Opcodes.reserve(100);

				{
				#define HANDLE_INST(OCNum, OCName, Class) Opcodes.push_back(OCNum);
				#include "llvm/IR/Instruction.def"
				}

				unsigned FullTripCounts = 0;
				for (unsigned OC : Opcodes) {
				Intrinsic::ID VPID = VPIntrinsic::GetForOpcode(OC);
				// no equivalent VP intrinsic available
				if (VPID == Intrinsic::not_intrinsic)
				continue;

				unsigned RoundTripOC = VPIntrinsic::GetFunctionalOpcodeForVP(VPID);
				// no equivalent Opcode available
				if (RoundTripOC == Instruction::Call)
				continue;

				ASSERT_EQ(RoundTripOC, OC);
				++FullTripCounts;
				}
				ASSERT_NE(FullTripCounts, 0u);
				}

				} // end anonymous namespace
				} // end namespace llvm

llvm/utils/TableGen/CodeGenIntrinsics.h

Show First 20 Lines • Show All 140 Lines • ▼ Show 20 Lines	struct CodeGenIntrinsic {

enum ArgAttribute {		enum ArgAttribute {
NoCapture,		NoCapture,
NoAlias,		NoAlias,
Returned,		Returned,
ReadOnly,		ReadOnly,
WriteOnly,		WriteOnly,
ReadNone,		ReadNone,
ImmArg		ImmArg,
		Mask,
		VectorLength,
		Passthru
};		};

std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;		std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;

bool hasProperty(enum SDNP Prop) const {		bool hasProperty(enum SDNP Prop) const {
return Properties & (1 << Prop);		return Properties & (1 << Prop);
}		}

Show All 34 Lines

llvm/utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 722 Lines • ▼ Show 20 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
PrintFatalError(DefLoc,		PrintFatalError(DefLoc,
Twine("ParamTypes is ") + TypeList->getAsString());		Twine("ParamTypes is ") + TypeList->getAsString());
}		}
VT = OverloadedVTs[MatchTy];		VT = OverloadedVTs[MatchTy];
// It only makes sense to use the extended and truncated vector element		// It only makes sense to use the extended and truncated vector element
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType") &&		!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|
!TyEl->isSubClassOf("LLVMScalarOrSameVectorWidth")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else		} else
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));

// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)		if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)
PrintFatalError(DefLoc, "Intrinsic '" + DefName +		PrintFatalError(DefLoc, "Intrinsic '" + DefName +
▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	else if (Property->isSubClassOf("NoCapture")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));		ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));
} else if (Property->isSubClassOf("NoAlias")) {		} else if (Property->isSubClassOf("NoAlias")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, NoAlias));		ArgumentAttributes.push_back(std::make_pair(ArgNo, NoAlias));
} else if (Property->isSubClassOf("Returned")) {		} else if (Property->isSubClassOf("Returned")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));		ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));
		} else if (Property->isSubClassOf("VectorLength")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, VectorLength));
		} else if (Property->isSubClassOf("Mask")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Mask));
		} else if (Property->isSubClassOf("Passthru")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Passthru));
} else if (Property->isSubClassOf("ReadOnly")) {		} else if (Property->isSubClassOf("ReadOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));
} else if (Property->isSubClassOf("WriteOnly")) {		} else if (Property->isSubClassOf("WriteOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));
} else if (Property->isSubClassOf("ReadNone")) {		} else if (Property->isSubClassOf("ReadNone")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
Show All 21 Lines

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 665 Lines • ▼ Show 20 Lines	if (ae) {
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::Returned:		case CodeGenIntrinsic::Returned:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::Returned";		OS << "Attribute::Returned";
addComma = true;		addComma = true;
break;		break;
		case CodeGenIntrinsic::VectorLength:
		if (addComma)
		OS << ",";
		OS << "Attribute::VectorLength";
		addComma = true;
		break;
		case CodeGenIntrinsic::Mask:
		if (addComma)
		OS << ",";
		OS << "Attribute::Mask";
		addComma = true;
		break;
		case CodeGenIntrinsic::Passthru:
		if (addComma)
		OS << ",";
		OS << "Attribute::Passthru";
		addComma = true;
		break;
case CodeGenIntrinsic::ReadOnly:		case CodeGenIntrinsic::ReadOnly:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::ReadOnly";		OS << "Attribute::ReadOnly";
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::WriteOnly:		case CodeGenIntrinsic::WriteOnly:
if (addComma)		if (addComma)
▲ Show 20 Lines • Show All 274 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

RFC: Prototype & Roadmap for vector predication in LLVMChanges PlannedPublic

Details

Vector Predication Roadmap

Vector Predication intrinsics

Roadmap

References

Diff Detail

Event Timeline

Updates

Cross references

Updates

Updates

Updates

Updates

Planned

Updates

Observations

Next steps

Integer slice patches

Changes

Changes required going from passthru to select:

Changes required going from select to passthru:

Changes required going from passthru to select:

Changes required going from select to passthru:

TODO:

Revision Contents

Diff 228052

llvm/docs/GettingInvolved.rst

llvm/docs/LangRef.rst

llvm/docs/Proposals/VectorPredication.rst

llvm/include/llvm/Analysis/InstructionSimplify.h

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/Bitcode/LLVMBitCodes.h

llvm/include/llvm/CodeGen/ExpandVectorPredication.h

llvm/include/llvm/CodeGen/ISDOpcodes.h

llvm/include/llvm/CodeGen/Passes.h

llvm/include/llvm/CodeGen/SelectionDAG.h

llvm/include/llvm/CodeGen/SelectionDAGNodes.h

llvm/include/llvm/IR/Attributes.td

llvm/include/llvm/IR/IRBuilder.h

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/MatcherCast.h

llvm/include/llvm/IR/PatternMatch.h

llvm/include/llvm/IR/PredicatedInst.h

llvm/include/llvm/IR/VPBuilder.h

llvm/include/llvm/InitializePasses.h

llvm/include/llvm/Target/TargetSelectionDAG.td

llvm/lib/Analysis/InstructionSimplify.cpp

llvm/lib/Analysis/TargetTransformInfo.cpp

llvm/lib/AsmParser/LLLexer.cpp

llvm/lib/AsmParser/LLParser.cpp

llvm/lib/AsmParser/LLToken.h

llvm/lib/Bitcode/Reader/BitcodeReader.cpp

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp

llvm/lib/CodeGen/CMakeLists.txt

llvm/lib/CodeGen/ExpandVectorPredication.cpp

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

llvm/lib/CodeGen/SelectionDAG/LegalizeTypes.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

llvm/lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

llvm/lib/CodeGen/TargetPassConfig.cpp

llvm/lib/IR/Attributes.cpp

llvm/lib/IR/CMakeLists.txt

llvm/lib/IR/IRBuilder.cpp

llvm/lib/IR/IntrinsicInst.cpp

llvm/lib/IR/PredicatedInst.cpp

llvm/lib/IR/VPBuilder.cpp

llvm/lib/IR/Verifier.cpp

llvm/lib/Transforms/InstCombine/InstCombineAddSub.cpp

llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp

llvm/lib/Transforms/InstCombine/InstCombineInternal.h

llvm/lib/Transforms/Utils/CodeExtractor.cpp

llvm/test/Bitcode/attributes.ll

RFC: Prototype & Roadmap for vector predication in LLVM
Changes PlannedPublic