This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
-
LangRef.rst
-
Proposals/
1/3
VectorPredication.rst
-
include/llvm/
-
llvm/
-
Analysis/
-
InstructionSimplify.h
-
Bitcode/
-
LLVMBitCodes.h
-
CodeGen/
-
ISDOpcodes.h
-
SelectionDAG.h
-
SelectionDAGNodes.h
-
IR/
-
Attributes.td
-
InstrTypes.h
-
IntrinsicInst.h
4/9
Intrinsics.td
-
MatcherCast.h
-
PatternMatch.h
-
PredicatedInst.h
-
VPBuilder.h
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
Analysis/
-
InstructionSimplify.cpp
-
AsmParser/
-
LLLexer.cpp
-
LLParser.cpp
-
LLToken.h
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
SelectionDAG.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
SelectionDAGISel.cpp
-
IR/
-
Attributes.cpp
-
CMakeLists.txt
-
IntrinsicInst.cpp
-
PredicatedInst.cpp
-
VPBuilder.cpp
-
Verifier.cpp
-
Transforms/
-
InstCombine/
-
InstCombineAddSub.cpp
-
InstCombineCalls.cpp
-
InstCombineInternal.h
-
Utils/
-
CodeExtractor.cpp
-
test/
-
Bitcode/
-
attributes.ll
-
Transforms/
-
InstCombine/
-
vp-fsub.ll
-
InstSimplify/
-
vp-fsub.ll
-
Verifier/
-
evl_attribs.ll
-
utils/TableGen/
-
TableGen/
-
CodeGenIntrinsics.h
-
CodeGenTarget.cpp
-
IntrinsicEmitter.cpp

Differential D57504

RFC: Prototype & Roadmap for vector predication in LLVM
Changes PlannedPublic

Authored by simoll on Jan 31 2019, 3:12 AM.

Download Raw Diff

Details

Reviewers

mkuper
fhahn
rengolin
huntergr
sdesmalen
m_zuckerman
jdoerfert

Summary

Vector Predication Roadmap

This proposal defines a roadmap towards native vector predication in LLVM, specifically for vector instructions with a mask and/or an explicit vector length.
LLVM currently has no target-independent means to model predicated vector instructions for modern SIMD ISAs such as AVX512, ARM SVE, the RISC-V V extension and NEC SX-Aurora.
Only some predicated vector operations, such as masked loads and stores are available through intrinsics [MaskedIR]_.

Please use docs/Proposals/VectorPredication.rst to comment on the summary.

Vector Predication intrinsics

The prototype in this patch demonstrates the following concepts:

Predicated vector intrinsics with an explicit mask and vector length parameter on IR level.
First-class predicated SDNodes on ISel level. Mask and vector length are value operands.
An incremental strategy to generalize PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both regular instructions and VP intrinsics.
DAGCombiner example: FMA fusion.
InstCombine/InstSimplify example: FSub pattern re-writes.
Early experiments on the LNT test suite (Clang static release, O3 -ffast-math) indicate that compile time on non-VP IR is not affected by the API abstractions in PatternMatch, etc.

Roadmap

Drawing from the prototype, we propose the following roadmap towards native vector predication in LLVM:

1. IR-level VP intrinsics

There is a consensus on the semantics/instruction set of VP intrinsics.
VP intrinsics and attributes are available on IR level.
TTI has capability flags for VP (`supportsVP()?, haveActiveVectorLength()`?).

Result: VP usable for IR-level vectorizers (LV, VPlan, RegionVectorizer), potential integration in Clang with builtins.

2. CodeGen support

VP intrinsics translate to first-class SDNodes (`llvm.vp.fdiv.* -> vp_fdiv`).
VP legalization (legalize explicit vector length to mask (AVX512), legalize VP SDNodes to pre-existing ones (SSE, NEON)).

Result: Backend development based on VP SDNodes.

3. Lift InstSimplify/InstCombine/DAGCombiner to VP

Introduce PredicatedInstruction, PredicatedBinaryOperator, .. helper classes that match standard vector IR and VP intrinsics.
Add a matcher context to PatternMatch and context-aware IR Builder APIs.
Incrementally lift DAGCombiner to work on VP SDNodes as well as on regular vector instructions.
Incrementally lift InstCombine/InstSimplify to operate on VP as well as regular IR instructions.

Result: Optimization of VP intrinsics on par with standard vector instructions.

4. Deprecate llvm.masked.* / llvm.experimental.reduce.*

Modernize llvm.masked.* / llvm.experimental.reduce* by translating to VP.
DCE transitional APIs.

Result: VP has superseded earlier vector intrinsics.

5. Predicated IR Instructions

Vector instructions have an optional mask and vector length parameter. These lower to VP SDNodes (from Stage 2).
Phase out VP intrinsics, only keeping those that are not equivalent to vectorized scalar instructions (reduce, shuffles, ..).
InstCombine/InstSimplify expect predication in regular Instructions (Stage (3) has laid the groundwork).

Result: Native vector predication in IR.

References

.. [MaskedIR] llvm.masked.* intrinsics, https://llvm.org/docs/LangRef.html#masked-vector-load-and-store-intrinsics
.. [EvlRFC] Explicit Vector Length RFC, https://reviews.llvm.org/D53613

Diff Detail

Repository

rL LLVM

Build Status

Buildable 30614
Build 30613: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

rengolin added subscribers: Ayal, hsaito.Feb 14 2019, 2:57 AM

cameron.mcinally added a subscriber: cameron.mcinally.Feb 25 2019, 10:08 AM

samparker added a subscriber: samparker.Feb 27 2019, 1:24 AM

SjoerdMeijer added a subscriber: SjoerdMeijer.Mar 7 2019, 11:41 AM

chill added a subscriber: chill.Mar 16 2019, 2:01 AM

re-based onto master

Herald added subscribers: nhaehnle, jvesely. · View Herald TranscriptMar 19 2019, 12:10 AM

Harbormaster completed remote builds in B29336: Diff 191252.Mar 19 2019, 12:12 AM

mcberg2017 added a subscriber: mcberg2017.Mar 19 2019, 4:30 PM

alexsusu added a subscriber: alexsusu.Mar 23 2019, 12:39 PM

dmgreen added a subscriber: dmgreen.Mar 28 2019, 10:00 AM

vchuravy added a subscriber: vchuravy.Apr 5 2019, 5:44 AM

Updates

added constrained fp intrinsics (IR level only).
initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Cross references

llvm.experimental.reduce.* (https://reviews.llvm.org/D60261 and/or https://reviews.llvm.org/D60262) - VP reduction signatures should track what comes out of that RFC.
SVE type support (https://reviews.llvm.org/D32530) - VPBuilder has to be made compatible with SVE types (it uses a static vector length atm).

Harbormaster completed remote builds in B30614: Diff 195366.Apr 16 2019, 6:39 AM

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.
Do we have enough upside in having both?

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

In D57504#1469847, @simoll wrote:

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

There is an indirect harm in adding more intrinsics with partially-redundant semantics: writing transformations and analyses requires logic that handles both forms. I recommend having fewer intrinsics where we can have fewer intrinsics.

In D57504#1469847, @simoll wrote:

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Then, please propose one more rounding mode, like round.permissive or round.any.

In D57504#1470254, @hfinkel wrote:

In D57504#1469847, @simoll wrote:

In D57504#1469354, @hsaito wrote:

In D57504#1468510, @simoll wrote:

Updates

added constrained fp intrinsics (IR level only).

initial support for mapping llvm.experimental.constrained.* intrinsics to llvm.vp.constrained.*.

Do we really need both vp.fadd() and vp.constrained.fadd()? Can't we just use the latter with rmInvalid/ebInvalid? That should prevent vp.constrained.fadd from losing optimizations w/o good reasons.

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

Do we have enough upside in having both?

I see no harm in having both since we already add the infrastructure in LLVM-VP to abstract away from specific instructions and/or intrinsics. Once (if ever) exception, rounding mode become available for native instructions (or can be an optional tag-on like fast-math flags), we can deprecate all constrained intrinsics and use llvm.vp.fdiv, etc or native instructions instead.

There is an indirect harm in adding more intrinsics with partially-redundant semantics: writing transformations and analyses requires logic that handles both forms. I recommend having fewer intrinsics where we can have fewer intrinsics.

Yep. If one additional generally-useful rounding mode gets rid of several partially redundant intrinsics, that would be a good trade-off.

In D57504#1469847, @simoll wrote:

According to the LLVM langref, "fpexcept.ignore" seems to be the right option for exceptions whereas there is no "round.permissive" option for the rounding behavior. Abusing rmInvalid/ebInvalid seems hacky.

If you use "round.tonearest" that will get you the same semantics as the non-constrained version. The optimizer assumes round-to-nearest by default.

Would it make sense to also update docs/AddingConstrainedIntrinsics.rst please?

Thanks for your feedback!

Planned

Make the llvm.vp.constrained.* versions the only fp ops in vp. Encode default fp semantics by passing fpexcept.ignore and round.tonearest.
Update docs/AddingConstrainedIntrinsics.rst to account for the fact that llvm.experimental.constrained.* is no longer the only namespace for constrained intrinsics.

In D57504#1470705, @kpn wrote:

Would it make sense to also update docs/AddingConstrainedIntrinsics.rst please?

Sure. I don't think we should match (in the API) an llvm.vp.constrained.* intrinsic as ConstrainedFPIntrinsic though.
Conceptually, an llvm.vp.constrained.* intrinsics sure is both - VPIntrinsic and ConstrainedFPIntrinsic. If the latter is used to transform them, ignoring the mask an vector len argument along the way, we'll see breakage (..in the future, once there are transforms for constrained fp).

pengfei added a subscriber: pengfei.Apr 22 2019, 6:17 PM

vkmr added a subscriber: vkmr.May 7 2019, 7:55 AM

rengolin mentioned this in D53613: RFC: Explicit Vector Length Intrinsics and Attributes.Jul 2 2019, 1:59 AM

This is a "Keepalive" message - I will get back working on LLVM-VP in October.

Herald added a reviewer: rengolin. · View Herald TranscriptAug 7 2019, 2:09 AM

Herald added subscribers: s.egerton, simoncook. · View Herald Transcript

Nice. Btw. another motivation could be std::simd. Here the overflow intrinsics exposed as builtin would allow us to provide a fast implementation of the masked variants of <simd>

sepavloff added a subscriber: sepavloff.Aug 21 2019, 9:31 AM

Picking this up again. I begin with changing the VP intrinsics as outlined before with one deviation from the earlier plan:

There will be no llvm.vp.constrained.* just llvm.vp.* and all FP intrinsics will have an exception mode and rounding mode parameter.

Herald added subscribers: lenary, hiraditya. · View Herald TranscriptOct 7 2019, 4:45 AM

This work was mentioned on the SVE discussion about predication, adding arm folks, just in case.

<<~same mail send to llvm-dev>>

Who is interested in a round table on vector predication at the '19 US DevMtg and/or would like to help organizing one? There were some proposals for related round tables on the mailing list but not all of them have a time slot yet (VPlan, SVE, complex math, constrained fp, ..). I am eyeing the Wednesday, 11:55 slot so please let me know if there is a schedule conflict i am not aware of.

Potential Topics:

Intersection with constrained-fp intrinsics and backend support (also complex arith).

Design of predicated reduction intrinsics (intersection with llvm.experimental.reduce[.v2].*).

Compatibility with SVE LLVM extension.

<Your topic here>

a.elovikov added a subscriber: a.elovikov.Oct 17 2019, 11:36 AM

rscottmanley added a subscriber: rscottmanley.Oct 18 2019, 8:07 AM

Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {

v = std.vector.select(5 / v, v, notzero);

}

In D57504#1720792, @shawnl wrote:
Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {
v = std.vector.select(5 / v, v, notzero);
}

What you describe is a workaround but not a solution for predicated SIMD in LLVM.
This approach may seem natural considering SIMD ISAs, such as x86 SSE, ARM NEON, that do not have predication.
It is however a bad fit for SIMD instruction sets that do support predicated SIMD (AVX512, ARM SVE, RISC-V V, NEC SX-Aurora).

As it turns out, it is more robust to have predicated instructions right in LLVM IR and convert them to the instruction+select pattern for SSE and friends than going the other way round.
This is what LLVM-VP proposes.

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.
@nhaehnle brought up that the LLVM-VP intrinsics should be convenient and natural to work with. The convenience wrappers (PredicatedInstruction, PredicatedBinaryOperator) and pattern rewrite generalizations already achieve this to a large extent. Specifically, there should be no "holes" when it comes to handling the intrinsics (eg it should not be necessary to resort to lower-level APIs (VPIntrinsic) when dealing with predicated SIMD). (To take something actionable from this, i think there should be an IRBuilder<>::CreateVectorFAdd(A, B, Mask, AVL, InsertPt), returning a PredicatedBinaryOperator, which may either be an FAdd instruction (or constrained fp..) or a llvm.vp.fadd intrinsic, depending on the fp environment, mask parameter, and vector length parameter.)

In D57504#1723466, @simoll wrote:
In D57504#1720792, @shawnl wrote:
Are predicated vector instructions not just a special case of DemandedBits? Why can't we leave out the .vp. intrinsics, and just generate the predicate with DemandedBits? That way you do a predicated vector operation like so (in zig): As the example makes clear, this optimization would have to be guaranteed in order for the generated code to be correct (as the predicate avoids a divide-by-zero error).

var notzero = v != 0;
if (std.vector.any(notzero)) {
v = std.vector.select(5 / v, v, notzero);
}
What you describe is a workaround but not a solution for predicated SIMD in LLVM.
This approach may seem natural considering SIMD ISAs, such as x86 SSE, ARM NEON, that do not have predication.
It is however a bad fit for SIMD instruction sets that do support predicated SIMD (AVX512, ARM SVE, RISC-V V, NEC SX-Aurora).

As it turns out, it is more robust to have predicated instructions right in LLVM IR and convert them to the instruction+select pattern for SSE and friends than going the other way round.
This is what LLVM-VP proposes.

+1 on what Simon said.

There are lots of peeps like:

select ?, X, undef -> X

If we optimize away the select, we could end up incorrectly trapping on the no longer masked bits of X. This would be bad for the constrained intrinsics.

But also in the general case, it's very hard to keep a select glued to an operation through opt and llc.

In D57504#1724196, @cameron.mcinally wrote:

+1 on what Simon said.

+1.

In D57504#1723586, @simoll wrote:

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.

I'd like to rant a little bit to see if anyone agrees with my probably unpopular opinion...

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I do understand the desire for these intrinsics by some -- i.e. devs that don't care about these new features aren't impacted. I think that's short sighted though. Hiding complexity behind utility functions will be painful when debugging tough problems. And updating existing code to use pattern matchers is a lot of churn -- probably more churn than making the constructs first-class citizens.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

In D57504#1724217, @cameron.mcinally wrote:

In D57504#1723586, @simoll wrote:

DevMtg Summary

There will be a separate RFC for the generalized pattern rewriting logic in LLVM-VP (see PatternMatch.h). We do this because it is useful for other efforts as well, eg to make the existing pattern rewrites in InstSimplify/Combine, DAGCombiner work also for constrained fp (@uweigand ) and complex arithmetic (@greened) . This may actually speedup things since we can pursue VP and generalized pattern match in parallel.

I'd like to rant a little bit to see if anyone agrees with my probably unpopular opinion...

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

Actually, the idea behind the generalized pattern code is to offer a way to gradually transition from intrinsics to native instruction support without disturbing transformations. The pattern matcher is templatized to match the intrinsics first (through utility classes). When the transition to native IR support is complete, one template-instantiation of the pattern rewriter gets dropped and the code bloat is undone. Eg in the case of VP, eventually PatternMatch will only ever be instantiated for the PredicatedContext and no longer for the special case of the EmptyContext. However, initially (in this patch) the pattern matcher is still instantiated for both kinds of context.
We can use the same mechanism to lift existing optimizations to complex arithmetic intrinsics. In that case, the matcher context would require that all constituent operations are complex number operators. The builder consuming the context will emit complex operations.

I do understand the desire for these intrinsics by some -- i.e. devs that don't care about these new features aren't impacted. I think that's short sighted though. Hiding complexity behind utility functions will be painful when debugging tough problems. And updating existing code to use pattern matchers is a lot of churn -- probably more churn than making the constructs first-class citizens.

Sure. You know its tempting to just duplicate all OpCodes (Opcodes v2) and redesign them (all of them..) to support all of this from the start: a) masking (also for scalar ops), b) an active vector length, c) constrained fp.
If you want the existing transformations to work with OpCodes v2 , you'd need exactly the same pattern generalizations, btw. In the end, whether its native instructions or intrinsics does not matter that much.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

I'd say LLVM is long past the starting line. If we just turn on predication on regular IR instructions, many existing instruction transformations will break. You'd need one monster commit that does the switch and fixes all these transformations at the same time.

In D57504#1724217, @cameron.mcinally wrote:

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I think this is a big hammer argument for a nuanced topic.

We have used experimental intrinsics for a large number of disparate concepts, from exception handling to fuzzy vector extensions, and then after the semantics was defined and accepted, we baked the concepts into IR.

This is a proven track, and predication is a very similar example to past experiences, I see no contradiction here.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

The risk of getting it wrong and having to re-bake into IR is high. We've done that with exception handling before and it wasn't pretty.

Predication is already in native IR form, albeit complex and error prone. The nuances across targets are too many to have a simple implementation working for everyone, and having a concrete implementation of the idea in intrinsic form may help clear up the issues before we stick to anything.

It's quite possible, and I really hope, that only a few targets will actually implement them, and that will be enough, so intrinsics will be short lived. Meanwhile, previous IR patterns will still match, so nothing is lost.

Of course, as with any intrinsic, it's quite possible that it will "just work" and people will give up half-way through. But history has shown that more often than not, these group efforts finish with a reasonable implementation, better than what we had before.

cheers,
--renato

+1 to what Renato said, I like this direction!
FWIW: we are working on Arm's M-profile Vector Extension (MVE), another vector extension for which this is very useful.

In D57504#1724902, @rengolin wrote:

In D57504#1724217, @cameron.mcinally wrote:

Code explosion is the symptom, not the sickness. It's caused by using experimental intrinsics. Experimental intrinsics are a detriment to progress. They end up creating a ton more work and are designed to be inevitably replaced.

I think this is a big hammer argument for a nuanced topic.

That's fair. But we can talk specifics too. We already have a lot of this functional and optimized in Clang. E.g.:

#include <stdio.h>

#pragma STDC FENV_ACCESS ON

void foo(double a[], double b[]) {
  double res[8];
  for(int i = 0; i < 8; i++)
    if (b[i] != 0.0)
      res[i] = a[i] / b[i];

  printf("%f\n", res[0]);
}

vmovupd (%rsi), %zmm0           #  test.c:8:9
vxorpd  %xmm1, %xmm1, %xmm1     #  test.c:8:14
vcmpneqpd       %zmm1, %zmm0, %k1 #  test.c:8:14
vmovupd (%rdi), %zmm1 {%k1} {z} #  test.c:9:16
vdivpd  %zmm0, %zmm1, %zmm0 {%k1} #  test.c:9:21
vmovupd %zmm0, (%rsp) {%k1}     #  test.c:9:14
vmovsd  (%rsp), %xmm0           #  test.c:11:18

That said, there's a large amount of technical debt from carrying these changes locally. I'd like to get out of that debt. That's why I'd like to avoid the experimental intrinsics detour.

I will also note that we care about a limited number of targets. So to be fair, take that into consideration.

We have used experimental intrinsics for a large number of disparate concepts, from exception handling to fuzzy vector extensions, and then after the semantics was defined and accepted, we baked the concepts into IR.

This is a proven track, and predication is a very similar example to past experiences, I see no contradiction here.

That's a fair argument too. I wasn't monitoring those projects, so I don't know the specifics.

Predication, Complex, and FPEnv require a massive amount of intrinsics to work though. Pretty much duplicating every operator (and target specific intrinsic for FPEnv). And probably some others I've forgotten. That seems like an unreasonable amount of intrinsics to me. But if others with experience in experimental intrinsics think it's manageable, I can't really argue.

IMHO, we'd be better off baking these new features into LLVM right from the start. These 3 topics are fairly significant features. It would be hard to argue that any one will go out of style in the foreseeable future...

The risk of getting it wrong and having to re-bake into IR is high. We've done that with exception handling before and it wasn't pretty.

Predication is already in native IR form, albeit complex and error prone. The nuances across targets are too many to have a simple implementation working for everyone, and having a concrete implementation of the idea in intrinsic form may help clear up the issues before we stick to anything.

It's quite possible, and I really hope, that only a few targets will actually implement them, and that will be enough, so intrinsics will be short lived. Meanwhile, previous IR patterns will still match, so nothing is lost.

Of course, as with any intrinsic, it's quite possible that it will "just work" and people will give up half-way through. But history has shown that more often than not, these group efforts finish with a reasonable implementation, better than what we had before.

Another good argument. And this isn't really the hill I want to die on. But it just seems silly to me to implement something twice: Occam's razor. We'll have to work the kinks out somewhere -- so why not push directly to the goal...

In D57504#1725429, @cameron.mcinally wrote:

But it just seems silly to me to implement something twice: Occam's razor. We'll have to work the kinks out somewhere -- so why not push directly to the goal...

I see where you're coming from, but hindsight is 20/20. Implementing something twice, when the first one is a prototype means you can make a lot of mistakes on the first iteration.

If the cost of changing the IR outweighs the prototyping costs (it usually does), than the overall cost is lower, even if for a longer period.

The current proposals are interlinked, so I don't think there will be combinatorial explosion, or even multiplication of intrinsics. I hope that we'll figure out the best way to represent that into IR sooner because of that.

This is not the first time that we try to get those into IR proper, either. All previous times we started with "change the IR" approach and could never get into agreement.

Intrinsics give us the prototype route: low implementation cost, low impact, easy to clean up later. It does add clutter in between, but that impact can also be limited to one or two targets of the willing sub-communities.

LLVM is a very fast moving target, stopping the world to get the IR "right" doesn't work.

A good example to look for is the scalable vector IR changes that have gone through multiple attempts and are going on for many years and still not complete...

These things take time, rushing it usually backfires. :)

In D57504#1725445, @rengolin wrote:

LLVM is a very fast moving target, stopping the world to get the IR "right" doesn't work.

A good example to look for is the scalable vector IR changes that have gone through multiple attempts and are going on for many years and still not complete...

These things take time, rushing it usually backfires. :)

Ha, yeah. All good points. I'll let this drop...

Updates

Fixed several intrinsic attributes.
All fp intrinsics are constrained (identically to the llvm.contrained.* ones). They behave like regular fp ops if fpexcept.ignore is passed.
Bitcode verifier test.

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.
The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.
When the exception behavior is not fpxcept.ignore, the fp op of the intrinsic is not revealed (getFunctionalOpcode(..) returns Call in that case).
(FIXME) NoCapture does not work on vectors of pointers.

Next steps

As mentioned earlier, generalized pattern matching will be part of a separate RFC (although its still included in this reference implementation).
I'd like to discuss the actual intrinsic signatures next. For that i will upload a new minimal patch for integer intrinsic support.

Harbormaster completed remote builds in B40198: Diff 226913.Oct 29 2019, 9:39 AM

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

llvm/include/llvm/CodeGen/ISDOpcodes.h
489 ↗	(On Diff #226913)	I was unfamiliar with this one... I think I know what it does, and how it is different from VP_SELECT, but for clarity, can you define what `integer pivot` is?
492 ↗	(On Diff #226913)	typo: hether
1134 ↗	(On Diff #226913)	just spell out 'otherwise' here, and also below.
llvm/include/llvm/CodeGen/SelectionDAGNodes.h
727 ↗	(On Diff #226913)	Perhaps outdated comment? Should it be something along the lines of 'vector predicated node' i.s.o. explicit vector lenght node?
1492 ↗	(On Diff #226913)	indentation of `\|\|` off by 1?
2357 ↗	(On Diff #226913)	`VP_LOAD` and `VP_STORE`?
2386 ↗	(On Diff #226913)	same?
2426 ↗	(On Diff #226913)	`.. does a truncation before store` sounds a bit odd. Since 'truncating store' is a well known term, and that you explain what it is for ints/floats below, I think it suffices to say "Return true if this is truncating store. For intergers ..."

In D57504#1725554, @simoll wrote:

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.

Wouldn't that allow the call to be moved relative to other calls? Specifically, we need to make sure intrinsics aren't moved relative to calls that change the rounding mode. The "inaccessiblememonly" attribute is meant to model both the reading of control modes and the possible setting of status flags or raising of exceptions"

The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.

I don't think this is right. Even if there are no constrained FP operations in the function we might have math library calls for which the strictfp attribute is needed to prevent libcall simplification and constant folding that might violate the rounding mode.

In D57504#1726355, @andrew.w.kaylor wrote:

In D57504#1725554, @simoll wrote:

Observations

When using fpexcept.ignore, the fp callsites should have the readnone attribute set on them to override the inaccessiblememonly of the intrinsic declarations. That way DCE still works.

Wouldn't that allow the call to be moved relative to other calls? Specifically, we need to make sure intrinsics aren't moved relative to calls that change the rounding mode. The "inaccessiblememonly" attribute is meant to model both the reading of control modes and the possible setting of status flags or raising of exceptions"

The rules for constrained fp (strictfp on the function definition, only constrained fp in that function) apply only if there is a single fp op with exceptions in the function. That is strictfp is not necessary when all fp ops have fpexcept.ignore.

I don't think this is right. Even if there are no constrained FP operations in the function we might have math library calls for which the strictfp attribute is needed to prevent libcall simplification and constant folding that might violate the rounding mode.

I see. Since we need to model the default fp environment with these intrinsics (and this is our priority), let me make the following suggestion: VP intrinsics will have a rounding mode and exception behavior argument from the start but the only allowed values are "round.tonearest" and "fpexcept.ignore". Once we have a solution for the general case implemented for constraint fp, we will unlock that feature also for LLVM VP.

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

In D57504#1726715, @simoll wrote:

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

Cheers. Just curious, what are your next steps? People can correct me if I'm wrong, but my impression is that with the RFC, this prototype, the discussion at the US LLVM dev conference, there is consensus and people are on-board with the general idea and direction. There are still some discussions on e.g. the (constrained) FP part, but would it now be the time split this for example up in an separate commits, like an INT and FP part (if that makes sense), so that they can be (separately) progressed?

In D57504#1729839, @SjoerdMeijer wrote:

In D57504#1726715, @simoll wrote:

In D57504#1726007, @SjoerdMeijer wrote:

Hi Simon, I went through the code for the first time, and this is a first round of proper nitpicks from my side. Please ignore if you want to focus on the bigger picture at this point in the discussion, but these are just some things I noticed. General nitpick is that you should run clang-format as there are quite a few coding style issues: indentation, indentation of arguments, exceeding 80 columns, placement of * and & in arguments and return values, etc. And find some more nitpicks inlined.

Hi Sjoerd, thanks for you comments! i've fixed the inline nitpicks right away. I'll do a style pass for the actual commits.

Cheers. Just curious, what are your next steps? People can correct me if I'm wrong, but my impression is that with the RFC, this prototype, the discussion at the US LLVM dev conference, there is consensus and people are on-board with the general idea and direction. There are still some discussions on e.g. the (constrained) FP part, but would it now be the time split this for example up in an separate commits, like an INT and FP part (if that makes sense), so that they can be (separately) progressed?

Yes, i hope that's where things are right now ;-) I am planning to go by functional slices. Each slice comes with IR-level intrinsics, TTI support, basic lowering to standard IR, Selection DAG support and tests.

I am preparing the first patchset for integer support atm.

Slices:

Integer slice.
Memory slice.
Reduction slice.
FP (with unconstrained metadata args) slice.

Standalone patch:

Mask, VectorLength and Passthru attributes (in preparation of vector function calls).

Pending discussion/separate RFC:

Constrained FP (being able to fully optimize constrained fp intrinsics in the default fp env).
Generalized pattern match (aka optimizing VP).

Nice one!

k-ishizaka added a subscriber: k-ishizaka.Nov 4 2019, 9:46 PM

D69552: Move floating point related entities to namespace level contains the fp enum changes required for LLVM-VP. Referencing the patch here.

Fixed attribute placements, signatures, more tests, ..
This is in sync with the subpatch #1 of the integer slice (https://reviews.llvm.org/D69891).

Harbormaster completed remote builds in B40575: Diff 228052.Nov 6 2019, 6:27 AM

Integer slice patches

#1 IR-level support: https://reviews.llvm.org/D69891
#2 TTI & Legalization: <stay tuned>
#3 ISel patch: <stay tuned>

I'll update this comment as we go to keep track of the integer slice.

simoll added a child revision: D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.Nov 7 2019, 12:39 AM

Changes

VPIntrinsics.def file.
Pass vlen i32 -1 to enable all lanes with scalable vector types.
Various NFC fixes.

Harbormaster completed remote builds in B40813: Diff 228885.Nov 12 2019, 6:53 AM

Moving the discussion from the integer patch alley to the main RFC as this is about the general design of VP intrinsics.. it's about having a passthru operand (as in llvm.masked.load) and whether %evl should be a parameter of the intrinsics or modelled differently.

@SjoerdMeijer https://reviews.llvm.org/D69891#inline-636845
and if I'm not mistaken we are now discussing if undef here should be undef or a passthru

@rkruppe https://reviews.llvm.org/D69891#inline-637215
I previously felt that passthru would be nice to have for backend maintainers (including myself) but perhaps not worth the duplication of IR functionality (having two ways to do selects). However, given the differences I just described, I don't think "just use select" is workable.

Ok. I do agree that having passthru simplifies isel for certain architectures (and legal combinations of passthru value, type, and operations..) but:
VP intrinsics aren't target intrinsics: they are not supposed to be a way to directly program any specific ISA in the way of a macroassembler, like you would do with llvm.x86.* or llvm.arm.mve.* or any other. Rather, think of them as regular IR instructions. Pretend that anything we propose in VP intrinsics will end up as a feature of a first-class LLVM instructions. Based on that i figured that one VP intrinsics should match one IR instructions plus predication, nothing more.

If we had predicated IR instructions, would we want them to have a passthru operand?
The prototype shows that defining VP intrinsics with undef-on-masked-out makes it straightforward to generalize InstSimplify/InstCombine/DAGCombiner such that they can optimize VP intrinsics. If you add a passthru operand then logically VP intrinsics start to behave like two instructions: that could be made work but it would be messier as you'd have to peek through selects, etc.

@sdesmalen https://reviews.llvm.org/D69891#1750287
If we want to solve the select issue and also keep the intrinsics simple, my suggestion was to combine the explicit vector length with the mask using an explicit intrinsic like @llvm.vp.enable.lanes. Because this is an explicit intrinsic, the code-generator can simply extract the %evl parameter and pass that directly to the instructions for RVV/SXA. This is what happens for many other intrinsics in LLVM already, like masked.load/masked.gather that support only a single addressing mode, where it is up to the code-generator to pick apart the value into operands that are suited for a more optimal load instruction.

Without having heard your thoughts on this suggestion, I would have to guess that your reservation is the possibility of LLVM hoisting/separating the logic that merges predicate mask and %evl value in some way. That would mean having to do some tricks (think CodeGenPrep) to keep the values together and recognizable for CodeGen. And that's the exact same thing we would like to avoid for supporting merging/zeroing predication, hence the suggestion for the explicit passthru parameter.

That's not quite the same:
%evl is mapped to a hardware register on SX-Aurora. We cannot simply reconstitute the %evl from any given mask, if %evl is obscured it makes all operations that depend on it less efficient because we need to default to the full vector length. Now, if the select is separated from the VP intrinsic, you simply emit one select instruction (and it should be possible to hoist it back and merge it with the VP intrinsic in most cases (.. and you probably want an optimization that does that in anyway because there will be code with explicit selects even with passthru)). Besides, if the select is folded with an instruction that is subsequently simpler then that's actually an argument in favor of explicit selects: passthru makes this implicit.

In D57504#1758546, @simoll wrote:

Ok. I do agree that having passthru simplifies isel for certain architectures (and legal combinations of passthru value, type, and operations..) but:
VP intrinsics aren't target intrinsics: they are not supposed to be a way to directly program any specific ISA in the way of a macroassembler, like you would do with llvm.x86.* or llvm.arm.mve.* or any other. Rather, think of them as regular IR instructions. Pretend that anything we propose in VP intrinsics will end up as a feature of a first-class LLVM instructions. Based on that i figured that one VP intrinsics should match one IR instructions plus predication, nothing more.

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better. I wasn't specifically advocating for passthru operands, though. I agree that not having passthru and performing the same function in two operations can be readily pattern-matched by backends at modest effort and failing to match it has low cost. My main point was just that the existing select instruction is not sufficient as the second operation, for essentially the same reason why the VP intrinsics have an EVL argument instead of just the mask. Creating a VP equivalent of select (as already sketched in the other thread) resolves that concern just as well.

simoll marked an inline comment as done.Dec 3 2019, 4:18 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15283–15300 ↗	(On Diff #228885)	@rkruppe [..] My main point was just that the existing select instruction is not sufficient as the second operation, for essentially the same reason why the VP intrinsics have an EVL argument instead of just the mask. Creating a VP equivalent of select (as already sketched in the other thread) resolves that concern just as well. I agree. The prototype has defined such an `llvm.vp.select` from the get-go.

rkruppe added inline comments.Dec 3 2019, 10:06 AM

llvm/docs/LangRef.rst
15283–15300 ↗	(On Diff #228885)	Oops, missed that / forgot about it. Sorry for the noise. Is there a reason why it's not in the "integer slice" patch? It's not integer-specific, but it seems to fit even less into the other slices.

simoll marked an inline comment as done.Dec 9 2019, 12:52 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15283–15300 ↗	(On Diff #228885)	I wanted to keep the integer patch concise for one. Also, having played around with this for a while now, i think that the signature of `vp.select` should be: llvm.vp.select(<W x i1> %m, %onTrue, %onFalse, i32 %threshold, i32 vlen %evl) meaning that values from %onTrue are selected where %m is true and the lane index is below %threshold. %onFalse is selected otherwise. Lane indices greater-equal %evl are undef as ever. In short: there is just one "merge" operation and no more separate `vp.compose`.

Herald added a subscriber: luismarques. · View Herald TranscriptDec 9 2019, 12:52 AM

Matt added a subscriber: Matt.Dec 18 2019, 12:57 PM

kariddi added a subscriber: kariddi.Dec 26 2019, 11:44 AM

Add pivot/threshold argument to llvm.vp.select and remove llvm.vp.compose
Clarify documentation on preserved lanes
explicit vlen arg is either negative or (new requirement) less-equal-than the number of lanes of the operation.

(not sure if I should continue here or in D69891, will try here first)

Sorry for dipping out of this discussion. I.e. after our "passthru discussion", I wanted to do more homework to make sure a "separate select" would work for us, if we wouldn't miss anything, but then other work happened and never got round to this. But I am still very interested, so dipping back in :-/

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better.

One of the problems I had that I found it difficult to see all consequences, and answers the sort of questions asked above (also because I haven't yet spend enough time on this). For example, being explicit in IR is in general a good thing to do? So yes, why not a passthru? But then I had the same question as Robin, not sure it would be better. The other thing I would mention again is that I think convenience is a pretty strong argument too, if this is most convenient for at least two / three other architectures, then why not? But then you could argue that it is simple to patch up with a select, and we're going in circles... At least the concern Robin brought up about the select seems to be addressed with the vp.select.

In D57504#1849694, @SjoerdMeijer wrote:

(not sure if I should continue here or in D69891, will try here first)

Yep, the RFC is the right place for conceptual discussions.

Sorry for dipping out of this discussion. I.e. after our "passthru discussion", I wanted to do more homework to make sure a "separate select" would work for us, if we wouldn't miss anything, but then other work happened and never got round to this. But I am still very interested, so dipping back in :-/

Welcome back :)

If we had predicated IR instructions, would we want them to have a passthru operand?

I think that would probably be a reasonable clean-slate IR design, though I am not at all sure if it would be better.

One of the problems I had that I found it difficult to see all consequences, and answers the sort of questions asked above (also because I haven't yet spend enough time on this). For example, being explicit in IR is in general a good thing to do? So yes, why not a passthru? But then I had the same question as Robin, not sure it would be better. The other thing I would mention again is that I think convenience is a pretty strong argument too, if this is most convenient for at least two / three other architectures, then why not? But then you could argue that it is simple to patch up with a select, and we're going in circles... At least the concern Robin brought up about the select seems to be addressed with the vp.select.

Couldn't agree more. I guess we just do now know at this point.. how about we move the discussion away from "which would be better?" to "if we decide for A now and later strongly realize that B would have been the right call.. how bad a u-turn would that be?"

Changes required going from passthru to select:

IR: modernize VP with passthru to intrinsic+select
Nothing more.. since we already had to implement the select+intrinsic matching logic anyway to fuse explicit selects into passthru operands.
Dead code: all the logic for dealing with the passthru operand: PatternMatch for passthru (instcombine, instsimplify, known bits..), etc

Changes required going from select to passthru:

IR: modernize and pass 'undef' as passthru
Implement that pass from the other scenario that folds select into passthru (and all the additional logic for dealing with passthru).
Dead code: none

My point here is that no matter how we decide: explicit selects and vp intrinsics will co-exist and have to be folded/optimized. However, in the explicit-select scenario we do not have to teach LLVM about passthru operands (PatternMatch -> InstCombine, ...).
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.
Regarding convenience: the IRBuilder could have, eg, a ::CreatePredicatedFAdd with an explicit (optional) passthru operand.. resulting in a VP op + select.

Couldn't agree more. I guess we just do now know at this point.. how about we move the discussion away from "which would be better?" to "if we decide for A now and later strongly realize that B would have been the right call.. how bad a u-turn would that be?"

Changes required going from passthru to select:

IR: modernize VP with passthru to intrinsic+select

Nothing more.. since we already had to implement the select+intrinsic matching logic anyway to fuse explicit selects into passthru operands.

Dead code: all the logic for dealing with the passthru operand: PatternMatch for passthru (instcombine, instsimplify, known bits..), etc

Changes required going from select to passthru:

IR: modernize and pass 'undef' as passthru

Implement that pass from the other scenario that folds select into passthru (and all the additional logic for dealing with passthru).

Dead code: none

My point here is that no matter how we decide: explicit selects and vp intrinsics will co-exist and have to be folded/optimized. However, in the explicit-select scenario we do not have to teach LLVM about passthru operands (PatternMatch -> InstCombine, ...).
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.
Regarding convenience: the IRBuilder could have, eg, a ::CreatePredicatedFAdd with an explicit (optional) passthru operand.. resulting in a VP op + select.

Thanks for summarising this. Fair enough, I think this sounds like a (good) plan.
I will continue in D69891, and will leave a comment there.

Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:

if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;

The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

In D57504#1851864, @cameron.mcinally wrote:
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:
if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;
The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

As Eli argued in that patch, IR like select %m, (constrained.fadd %a, %b), %passthru is not expressing a predicated vector add, and must not be selected as such. The IR semantics are unambiguously: first a full vector add is performed (with all exceptions etc. that entails, or possible UB in related cases like integer division) and then some of the resulting lanes are replaced with values from %passthru. To predicate the fadd itself, a dedicated operation/intrinsic is needed. LLVM IR does not currently (and should not) change the meaning of the regular unpredicated operations based on (some? any?) uses of the value being a select. The only thing a select (or vp.select) can do is alter the lanes of a vector after it has been computed, it cannot travel back in time to change how it was computed.

VP intrinsics are the aforementioned predicated operations: in certain lanes, no computation (which might raising FP exceptions, have UB, etc.) happens and the resulting vector has some "default value" instead. The present discussion about whether to include a %passthru argument is just about how this default value is determined. But this does not change that the operation itself is predicated, it just affects how you express e.g. the patterns that map to SVE's zeroing and merging predication.

In D57504#1851960, @rkruppe wrote:

As Eli argued in that patch, IR like select %m, (constrained.fadd %a, %b), %passthru is not expressing a predicated vector add, and must not be selected as such. The IR semantics are unambiguously: first a full vector add is performed (with all exceptions etc. that entails, or possible UB in related cases like integer division) and then some of the resulting lanes are replaced with values from %passthru. To predicate the fadd itself, a dedicated operation/intrinsic is needed. LLVM IR does not currently (and should not) change the meaning of the regular unpredicated operations based on (some? any?) uses of the value being a select. The only thing a select (or vp.select) can do is alter the lanes of a vector after it has been computed, it cannot travel back in time to change how it was computed.

VP intrinsics are the aforementioned predicated operations: in certain lanes, no computation (which might raising FP exceptions, have UB, etc.) happens and the resulting vector has some "default value" instead. The present discussion about whether to include a %passthru argument is just about how this default value is determined. But this does not change that the operation itself is predicated, it just affects how you express e.g. the patterns that map to SVE's zeroing and merging predication.

Understood. I now see that we already discussed this here in October.

Your current argument sounds like it argues for explicit passthrus. E.g.:

select %m, (vp.fadd %m, %a, %b), %zeroinitializer

On SVE, this would become something like:

movprfx z0.s, p0/z, z0.s
fadd z0.s, p0/m, z0.s, z1.s

Isn't that traveling back in time to change how the inactive elements are defined? To be true to the IR. we'd want something like:

fadd z0.s, p0/m, z0.s, z1.s
sel z0s, p0/m, z0.s, <zero_vector>

How do we justify that this case is different than the op+select->predicated_op case? Are we assuming the implicit undef on the VP intrinsic allows for it?

I'm not sure what problem you think there might be? Both code sequences do the same thing (same side effects, same final result) as the input IR they matched, right? So that's what justifies them both as valid outputs and the choice is just a matter of codegen quality. You don't even need to appeal to the vp.fadd producing undef in disabled lanes, because in the final result those lanes are zero anyway and that's all that matters. This doesn't seem fundamentally more tricky than any other isel pattern that matches multiple IR instructions to produce a more efficient combined instruction. For example, if the ARM backend selects add i32 %a, (shl i32 %b, 4) as add r0, r0, r1, lsl #4, it never materializes shl %b, 4 (not into a register, at least) but the end result is still correct.

In D57504#1852185, @rkruppe wrote:

I'm not sure what problem you think there might be? Both code sequences do the same thing (same side effects, same final result) as the input IR they matched, right?

Ah, right. That side effects are the difference. Thanks for reminding me.

So that's what justifies them both as valid outputs and the choice is just a matter of codegen quality. You don't even need to appeal to the vp.fadd producing undef in disabled lanes, because in the final result those lanes are zero anyway and that's all that matters. This doesn't seem fundamentally more tricky than any other isel pattern that matches multiple IR instructions to produce a more efficient combined instruction. For example, if the ARM backend selects add i32 %a, (shl i32 %b, 4) as add r0, r0, r1, lsl #4, it never materializes shl %b, 4 (not into a register, at least) but the end result is still correct.

Yeah, this was what I was hung up on. I didn't see the difference between something like not materializing a dead instruction and masking an inactive element. But, yeah. the side effects would not be the same.

In D57504#1851864, @cameron.mcinally wrote:
Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:
if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;
The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

@hsaito and I had a discussion about this earlier today. I had the same concern, that optimizations after the vectorizer might do something to decouple the vp.select from the vp.{operation}, which could lead to the code generator not being able to create a masked operation with passthru on targets that support that and thus potentially invalidate the cost model assumptions that the vectorizer made when it generated the predicated operation. Hideki convinced me that the additional freedom from explicit dependencies gained by not having a passthru argument as part of the predicated operation was likely to be more beneficial than tight coupling. If we ever do find this to be a problem, we can do something to make the intervening optimizations less aggressive with this sort of pattern.

I also talked briefly with @craig.topper about the X86 codegen handling of this, and his off the cuff reaction was to think that we probably won't have any problem generating the desired passthru+masked instructions from separated vp.select operations.

andrew.w.kaylor added inline comments.Jan 31 2020, 2:06 PM

llvm/docs/Proposals/VectorPredication.rst
2 ↗	(On Diff #228885)	Is there any reason that some form of this document can't be committed now? We have at least enough support to claim this as a community wide proposal, right?

(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:

The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.

The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.
The instruction returns poison (i.e., all result lanes are poison) and all lanes are (potentially, non-deterministically) enabled regardless of the mask parameter. This is less restrictive for IR optimizations (e.g., integer vp.add can unconditionally be speculated) but still allows backends to unconditionally use SETVL-style "stripmining" instructions that are not generally consistent (across architectures) w.r.t. which lanes become active when a vector length greater than the hardware vector length is requested.
%EVLmask is undef, that's all. As consequence, lanes disabled by the %mask argument definitely stay disabled, but for other lanes (where the mask has a 1 or an undef) it's non-deterministic whether they are active. As far as I can see, this has pretty much the same implications for IR optimizations and backends (excluding hypothetical pathological architectures) but is less of a special case to specify and directly captures the diversity of hardware behavior that (presumably) motivates this restriction on EVL.

Off the cuff, I would suggest the last option.

In D57504#1853591, @rkruppe wrote:
(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:
The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.
The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.

The instruction returns poison (i.e., all result lanes are poison) and all lanes are (potentially, non-deterministically) enabled regardless of the mask parameter. This is less restrictive for IR optimizations (e.g., integer vp.add can unconditionally be speculated) but still allows backends to unconditionally use SETVL-style "stripmining" instructions that are not generally consistent (across architectures) w.r.t. which lanes become active when a vector length greater than the hardware vector length is requested.

%EVLmask is undef, that's all. As consequence, lanes disabled by the %mask argument definitely stay disabled, but for other lanes (where the mask has a 1 or an undef) it's non-deterministic whether they are active. As far as I can see, this has pretty much the same implications for IR optimizations and backends (excluding hypothetical pathological architectures) but is less of a special case to specify and directly captures the diversity of hardware behavior that (presumably) motivates this restriction on EVL.

Off the cuff, I would suggest the last option.

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

In D57504#1853671, @programmerjake wrote:

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

Presumably you have an efficient way to somehow force the VL into the intended range to support strip-mining of loops? The exact strategy doesn't matter, anything that avoids VL being "out of bounds" should make the other options work just fine. (Assuming there aren't other, larger problems with mapping VP operations to your ISA.)

In D57504#1853802, @rkruppe wrote:

In D57504#1853671, @programmerjake wrote:

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

Presumably you have an efficient way to somehow force the VL into the intended range to support strip-mining of loops? The exact strategy doesn't matter, anything that avoids VL being "out of bounds" should make the other options work just fine. (Assuming there aren't other, larger problems with mapping VP operations to your ISA.)

Yes, we do (setvl has a immediate for max VL, which needs to be calculated by the register allocator or similar), though it can be bypassed by writing directly to the VL register.

So, in that case, we should be able to use option #2 or #3, as long as the compiler doesn't write to VL by any means other than setvl.

In D57504#1853591, @rkruppe wrote:
(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Right now, LangRef changes in D69891 describe the restriction on the EVL value as this:
The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, its value is in the range:
0 <= %evl <= W,   where W is the vector length.
The restriction is good, but this wording doesn't specify what happens when %evl is not in that range. Some sort of undefined behavior, I assume, but this must be explicitly stated, especially since there are many ways in which it could be undefined. I don't recall previous discussion of this detail and I don't know what you have in mind, but some possibilities I see:

The instruction has capital-UB undefined behavior. This gives the greatest flexibility to backends (e.g., allows generation of code that traps if %evl is too large) but I don't know of any architecture that needs this much flexibility and it constrains IR optimizations (code hoisting etc.) the most.

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.
Besides, this still allows you to speculate as long as MVL (as in the UB-causing bound for VL) does not go below VL... could you explain under which circumstance MVL would go below VL by hoisting? This is definitely not the case for static VL targets (x86) and also not for VE.

TODO:

Define behavior for %evl > W
Amend that W is target specific.

llvm/docs/Proposals/VectorPredication.rst
2 ↗	(On Diff #228885)	I think so. I'll put the proposal doc up for review.

simoll mentioned this in D73889: [Doc] Proposal for vector predication.Feb 3 2020, 6:28 AM

simoll added a child revision: D73889: [Doc] Proposal for vector predication.Feb 3 2020, 6:36 AM

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

I think I'm lost here. Which thing is VL and which is MVL in this scenario?

Also, the talk about how various hardware treats the relative values of VL and MVL concerns me if either of these is supposed to be the width of the vector passed to this intrinsic. My understanding is that we're supposed to be able to generate vectors of any width we want in IR and the type legalization is responsible for mapping that to vector sizes that are legal for the target. So what does the target requirement mean here?

In D57504#1856207, @andrew.w.kaylor wrote:

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

I think I'm lost here. Which thing is VL and which is MVL in this scenario?

VL == %evl
MVL == W
Sorry for the vector speak :)

Also, the talk about how various hardware treats the relative values of VL and MVL concerns me if either of these is supposed to be the width of the vector passed to this intrinsic. My understanding is that we're supposed to be able to generate vectors of any width we want in IR and the type legalization is responsible for mapping that to vector sizes that are legal for the target. So what does the target requirement mean here?

I agree that, in the end, the semantics will be based solely on IR-types. However, what that semantics should look like for the %evl > W case depends on the way targets can handle this to make sure that whatever we specify on IR-level is at least reasonable for all targets.

From what I recall, the plan is to implement this by using fixed-size vector types combined with VL-based ops. MVL would be the size of those vector types.

Quoting all of lkcl's email so it ends up in Phabricator:

On Tue, Feb 4, 2020 at 3:48 AM @lkcl wrote:

In D57504#1856586, @simoll wrote:

In D57504#1856207, @andrew.w.kaylor wrote:

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a
hardware exception. Enforcing strict UB here means VP-users have to
explicitly drop instructions that keep the VL within bounds. This means
that we can optimize the VL computation code and that it can be factored
into cost calculations, etc. With Options 2 & 3 this would happen only
very late in the backend when most scalar optimizations are already
done.

I think I'm lost here. Which thing is VL and which is MVL in this
scenario?

VL == %evl
MVL == W
Sorry for the vector speak :)

ah. right. that bit of information was important, simon :) without
clarification, i assumed W was the "required vector length at the
program loop level", whoops..

I agree that, in the end, the semantics will be based solely on IR-types.
However, what that semantics should look like for the %evl > W case
depends on the way targets can handle this to make sure that whatever we
specify on IR-level is at least reasonable for all targets.

okaaay, riight, so the purpose of the discussion is, e.g., to work out
how to represent things like for-loops in the strcpy example here, is
that right?

https://www.sigarch.org/simd-instructions-considered-harmful/

so %evl > W (i.e. %evl > MVL) in RVV, it is the very effort of trying
to *set* %evl to the loop length, this is retried *in every loop*.
and the implementation (in hardware) very very specifically -
unbeknownst to the programmer (and to the IR writer) - hard-limits
%evl *to* MVL.

to be clear: although the programmer *tries* to set %evl > MVL, this
*never happens*: %evl will *always* be actually set to <= MVL.

it's quite clever.

it is really really important - a critical part of the design of RVV
loops - that the programmer (or LLVM compiler developer in this case)
*not* even know or make any assumptions about what MVL will be. some
hardware will actually have MVL equal to 1. some really unbelievably
powerful and stupidly expensive hardware might have MVL equal to 65536
(yes really, 65536 wide vector ALUs) and the critical thing is, the
assembly code *does not care*. it still works perfectly on both,
despite the fact that you have no idea, really, what value MVL is
going to be.

SimpleV is different in that you absolutely must explicitly declare,
as part of any assembly loops (or any other instructions), precisely
and exactly how large MVL is to be. this is because it is an
"allocation of the number of scalar registers - from the *scalar*
regfile - to be used for the vector operation".

thus, for SimpleV, we do actually need a way in LLVM to represent
(set) MVL, because it is quite literally an "explicit reservation of a
certain size and number of registers".

think of it as a way to say "hey y'know these upcoming SIMD
instructions? yeah, we need to set them to all be of length 8 for this
set. then, like, next we need to set all the upcoming SIMD
instructions to 16, y'ken". actually they're not SIMD they're
vector-ops but you get the idea.

this we do with an *extra* parameter to the SV.SETVL instruction
https://libre-riscv.org/simple_v_extension/appendix/#index8h1

SV.SETVL a2, t4, 8 # MVL==8

now, *if* we have a way to set MVL (through LLVM-IR), we can *also*
use that for doing saving/restoring of entire scalar register files
with a single instruction, as well as use it for function call
register stack save/restore.

basically when we have control over MVL through LLVM-IR, we get a
"LD.MULTI" and "ST.MULTI" instruction "for free" as an accidental
side-benefit.

SV.SETMVL #32 ; tells the hardware that vector operations are to
use 32 *scalar* regs
SV.LD a0, f0, #8 ; loads registers f0 thru f31 from the address at (a0+8)

for SIMD systems such as x86 and ARM, the only way to keep loops as
simple as RVV and SV, you'd need an instruction which, when you got to
the last run through the loop, then whilst %evl would be set to some
fixed-width-at-the-SIMD-boundary, some predicate mask was set up
*instead*... and thus despite the SIMD operation still being 4 (or 8,
or 16), the elements at the end were left alone (masked out)

without such an instruction (one which sets up the predicate bitmask
as not being all 1s on the last loop) you'd have to have a sequence of
instructions that effectively do the same job, and those instructions
will, clearly, impact performance due to them being executed on each
and every loop.

this is, unless the above is expressly supported in a single
instruction (one equivalent to SETVL
which sets up the predicate mask on the last loop) i am sorry to have
to use this particular phrase, a dog's dinner approach when compared
to variable-run vectorisation, and it's why i keep warning that
attempting to add support for fixed-power-of-two-%evl in this proposal
is not a good idea.

even if you _do_ have such an instruction (or a really really short
sequence that's equivalent and does not impact the length of the loop
too badly), the fact that the assembly code has to use 16 wide SIMD if
you want to do high-performance but then if you have short loops you
are wasting ALU resources but if you use 4 wide SIMD to stop wasting
ALU resources you can't do high-performance, you are screwed both
coming and going, and, ultimately, have to resort to stripmining to
properly solve it, and at that point we're *definitely* outside of the
scope of this proposal [as i understand it].

l.

In D57504#1857309, @programmerjake wrote:

From what I recall, the plan is to implement this by using fixed-size vector types combined with VL-based ops. MVL would be the size of those vector types.

To be clear, I'm referring specifically to LLVM IR for SimpleV, not for other targets.

OK. I was picturing MVL as some sort of maximum supported by the hardware in some sense or context. I think(?) I've got it now.

So let me ask about how you're picturing this working on targets that don't support these non-fixed vector lengths. The comments from lkcl have me concerned that we're going to be asked to emulate this behavior, which is possible I suppose but probably not the best choice performance wise. Consider this call:

%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask, i32 4)

Frankly, I'd hope never to see such a thing. We talked about using -1 for the %evl argument for targets that don't support variable vector length (is that the right phrase?), but what are we supposed to do if something else is used?

Disregarding the %evl argument for the moment, the x86 type legalizer might lower this as a masked <8 x double> fadd, or it might lower it as two <4 x double> fadd operations, or it might scalarize it entirely. Even if the target hardware supports 512-bit vectors we might choose to lower it as two <4 x double> fadds. Or we might not. The backend currently considers itself to have the freedom to do anything that meets the semantics of the intrinsic. So that brings up the question of whether we will be expected to honor the %evl argument. In this case, it would be fairly trivial to do so. However, the possibility raises a concern about what the code that generated this IR was trying to do and whether it is a reasonable thing to have done for x86 backends.

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

In D57504#1857458, @andrew.w.kaylor wrote:
OK. I was picturing MVL as some sort of maximum supported by the hardware in some sense or context. I think(?) I've got it now.

So let me ask about how you're picturing this working on targets that don't support these non-fixed vector lengths. The comments from lkcl have me concerned that we're going to be asked to emulate this behavior, which is possible I suppose but probably not the best choice performance wise. Consider this call:
%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask, i32 4)
Frankly, I'd hope never to see such a thing. We talked about using -1 for the %evl argument for targets that don't support variable vector length (is that the right phrase?), but what are we supposed to do if something else is used?

For targets that do not support %evl they can say so through TTI and the ExpandVectorPredicationPass will convert it into:

%mask.vl = icmp ult <8 x i1> <0,1,2,3,4,5,6,7>, ("splat' <8 x i32> 4)
%mask.new = and <8 x i1> %mask, %mask.vl
%sum = call <8 x double> @llvm.vp.fadd.f64(<8 x double> %x,<8 x double> %y, <8 x i1> %mask.new, i32 -1)

Basically, %evl never hits the X86 backend and can be ignored. The expansion pass implements one, unified, legalization strategy for all non-VL targets, achieving predictable behavior across targets.

Disregarding the %evl argument for the moment, the x86 type legalizer might lower this as a masked <8 x double> fadd, or it might lower it as two <4 x double> fadd operations, or it might scalarize it entirely. Even if the target hardware supports 512-bit vectors we might choose to lower it as two <4 x double> fadds. Or we might not. The backend currently considers itself to have the freedom to do anything that meets the semantics of the intrinsic. So that brings up the question of whether we will be expected to honor the %evl argument. In this case, it would be fairly trivial to do so. However, the possibility raises a concern about what the code that generated this IR was trying to do and whether it is a reasonable thing to have done for x86 backends.

I see two sources for VP intrinsics in code:
1.) Hand-written intrinsic code (if we expose VP as C intrinsics in Clang and/or somebody directly implements say a math library in VP, ..)
We do not claim performance portability for VP code. If your actual target is AVX512 and you use VP intrinsics, do not use the %evl parameter (or know how the expansion pass is going to lower it and exploit that).

2.) Optimization passes and (vectorizing) frontends
Vectorizers/frontends should query TTI to decide whether they should be using %evl.
For VL targets, the loop vectorizer could use %evl to implement tail loop predication (as in the DAXPY example https://www.sigarch.org/simd-instructions-considered-harmful/ , linked by @lkcl).
For non-VL targets, you should make the iteration mask the root mask of all other predicates in the loop and set %evl to -1.

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In D57504#1861256, @simoll wrote:

Basically, I want to actively discourage front ends and optimizations from using the %evl argument in cases where it won't be optimal.

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.

2.) Optimization passes and (vectorizing) frontends
Vectorizers/frontends should query TTI to decide whether they should be using %evl.
For VL targets, the loop vectorizer could use %evl to implement tail loop predication (as in the DAXPY example https://www.sigarch.org/simd-instructions-considered-harmful/ , linked by @lkcl).
For non-VL targets, you should make the iteration mask the root mask of all other predicates in the loop and set %evl to -1.

FWIW this is the approach we plan to use at BSC to vectorize using RISC-V extension. We're currently adding mask information to VPlan recipes that when executed should emit VPred operations with masking. Our plan includes a vplan→vplan transformation that would express the "root" mask as a "set vector length" operation.

In D57504#1862202, @andrew.w.kaylor wrote:

TTI would tell front ends and optimizations that %evl is a no-go for your target. Is this enough discouragement?

In theory, yes. In practice, it will depend on how optimizations make use of that information. Your explanation of how the ExpandVectorPredicationPass will make this palatable to the backend worries me a little, because it essentially means that optimizations don't have to care that the target doesn't support this feature. They can generate IR that uses it and EVPP will smooth over it. Obviously, we could handle this on a case-by-case basis as it comes up. As you say, TTI will provide sufficient information for passes to make the decision.

ok so it is starting to sink in what is being proposed: a *mainstream* pass in llvm that *always* puts in vector predication, and then various backends, depending on hardware capability, will either have passes that turn that mandatory vector predication into scalar loops, or SIMD / SIMT (getting rid of %evl in the process), or, in the case of Cray-inspired hardware, calling SETVL assembly code.

if that's accurate, then wow that's quite bold and has a lot of advantages.

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

however for SIMD (ARM, x86, other) i have a suspicion that being able to "hint" the best size of SIMD instruction width to use is probably a good idea.

if a SIMD width hint is available it happens to be synonymous with SimpleV's (hard) requirent to be able to specify MVL.

a scalar system would ignore both %evl and %mvl (or better mpvl - max partition vector length) i.e passes woule eliminate them.

a SIMD system would use %mpvl to choose the best SIMD opcodes for the job, the passes would subdivide work into such chunks then generate the suitablr cornercase last loop as well, *ignoring* %evl in the process.

SimpleV would use both to generate opcodes, coordinating with the regfile allocator, correctly and efficiently.

simoll mentioned this in rGc49b9e0d3284: [Doc] Proposal for vector predication.Feb 10 2020, 1:36 AM

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.
Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now.. again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

The up-to-date list of planned changes (also for this patch) is here: https://reviews.llvm.org/D69891#1871485

In D57504#1871521, @simoll wrote:

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

mmm... honestly? probably not. however we can get away with either inline assembler (for a very limited subset of requirements) or just going "y'know what, let's just set MVL hard-coded to default to 4 or 8 for all loops", for now, as best matched to the (planned) maximum internal register read/write ports for our first chip.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length. [for anyone who may be under the impression that RVV does not need the concept of MVL: see the sub-extension which fits the vector regfile onto the scalar (FP) regfile. if the FP regfile is to be used and useful at the same time, then there needs to be a way to explicity define how much of the FP regfile is to be allocated (to* RVV, and that in turn means being able to define the number of "lanes" to actually be used... which is, funnily enough, exactly what *setting* MVL. N(Lanes) == MVL. MVL == N(Lanes) ].

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

yes, as a separate proposal.

In D57504#1871991, @lkcl wrote:

In D57504#1871521, @simoll wrote:

In D57504#1862968, @lkcl wrote:

i have a suggestion. for SimpleV we.definitely need to have an explicit way to specify MVL. this because it is literally specifying precisely how many scalar registers are to be allocated for a vector op.

Would it work for you if we leave the definition of MVL for scalable types to the targets?

mmm... honestly? probably not. however we can get away with either inline assembler (for a very limited subset of requirements) or just going "y'know what, let's just set MVL hard-coded to default to 4 or 8 for all loops", for now, as best matched to the (planned) maximum internal register read/write ports for our first chip.

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

<snip> [for anyone who may be under the impression that RVV does not need the concept of MVL: see the sub-extension which fits the vector regfile onto the scalar (FP) regfile. if the FP regfile is to be used and useful at the same time, then tere needs to be a way to explicity define how much of the FP regfile is to be allocated (to* RVV, and that in turn means being able to define the number of "lanes" to actually be used... which is, funnily enough, exactly what *setting* MVL. N(Lanes) == MVL. MVL == N(Lanes) ].

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

Yep, and we wouldn't get near the level of support for this RFC otherwise.

again a point we could revisit should all scalable vector ISAs someday agree on one way to define MVL.

yes, as a separate proposal.

In D57504#1854330, @simoll wrote:

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

Ok, I didn't realize VE's SETVL works like that. In that case we don't have much of a choice, unfortunately.

Besides, this still allows you to speculate as long as MVL (as in the UB-causing bound for VL) does not go below VL... could you explain under which circumstance MVL would go below VL by hoisting? This is definitely not the case for static VL targets (x86) and also not for VE.

Of course, for lots of IR that we care about in practice, it will be quite simple to see that hoisting is safe, e.g. because:

%evl it is a constant -1
%evl is computed in a way that can be recognized to produce a small enough value (typical strip-mined loops)
there are earlier unconditional VP operations with the same EVL value (most vectorized functions)

But you need some such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code, otherwise fall back to another implementation. This might mean having vp.foo(..., i32 4) instructions guarded by a runtime check that effectively determines whether that 4 is a legal value, and hoisting the computation out of the condition introduces UB in the executions where it isn't.

Whether this would lead to any end-to-end miscompilations is another question, but that's not a good excuse to implement known-incorrect optimizations.

In D57504#1872310, @simoll wrote:

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

ah yes got you.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

mmm... i don't believe that's a wise choice / decision / assumption. i am partly-guessing-and-making-architectural-assumptions here: imagine that the (very-well-informed) programmer knows how the pipelines of a particular processor work (and i do mean very well), they know that there are a couple of separate pipelines, one which handles e.g. NxFP32, one which handles MxFP64, but that if you issue SIMD instructions of width N=Mx2, it will result in a "blockage" (stall) and under-utilisation.

*however*... if you issue *half* the workload (i.e. MVL == W/2) for the FP32 instructions interleaved with "full" workload (MVL==W for the FP64 ops), *then*, because of the way that the architecture works the two suites of instructions *will* go to the separate pipelines, *will* get done in parallel, because you're not overloading the exact same 64-bit-wide pipeline entrypoint if you'd done... you get what i'm trying to say?

i think what i'm trying to say works better for MMX (the instructions which shared the FP regfile with SIMD instructions, is that right? or is it SSE?) - there you definitely want control over how much of the regfile is allocated to SIMD and how much remains actual for scalar-FP usage, and if MVL == W as a hard-coded assumption, with no "hint", you could end up taking up far more of the FP regfile for SIMD MMX than is efficient / effective.

however... if the compiler could be *explicitly* told, "hey i want you to use only W/2 or W/4 worth of the FP regfile for SIMD operations please, and to automatically create a 2x or 4x loop that makes up for it *as if* you had done a full MVL==W single SIMD instruction", then it becomes possible to create a balance there which will not hammer the L1/L2 cache with LD/ST operations, consuming far more power than necessary, because the SIMD instructions completely dominate the entirety of the FP regfile.

we quickly learned from 3D workloads that they are very computationally-intensive and fit a "LD, massive-amounts-of-SIMD-processing, ST" pattern with *very* little in the way of overlaps. consequently, if the compiler generates:

LD
half-the-processing-because-there's-not-enough-registers
ST-some-temps
do-some-more-processing
LD-out-of-temps, do-a-bit-more-processing
ST

this is horribly, horribly power-inefficient.

so being able to balance the workload, keep things entirely in the regfile even if it means using half-wide (or quarter-wide) SIMD ops and the loops taking twice or 4 times longer in order to avoid the spill into temporary LD/STs, this is far more important than trying to make "individual" SIMD operations (ones that consume far too much of the regfile and result in LD/ST "spill") as wide as possible.

again, however: i'm raising this not to suggest that it be part of *this* RFC, i'm just document it to make sure it's not forgotten, for later.

Besides, i think that defining MVL is out of the scope of this RFC given the diversity of scalable vector ISAs right now..

this is cool and exciting.

Yep, and we wouldn't get near the level of support for this RFC otherwise.

yehyeh.

In D57504#1872374, @lkcl wrote:

In D57504#1872310, @simoll wrote:

I think i wasn't clear: what i meant to say is that we will not decide how MVL is defined/queried/set in the scope of this RFC... potentially leading to the situation that every target comes with its own set of target intrinsics to do so.

ah yes got you.

This would allow you (and ARM MVE/SVE , RISC-V V) to have their own mechanism for setting/querying MVL.

and x86-for-hinting-the-SIMD-length.

For x86 with scalable types, yes. For "classic" SIMD types MVL == W of <W x type>

mmm... i don't believe that's a wise choice / decision / assumption. i am partly-guessing-and-making-architectural-assumptions here: imagine that the (very-well-informed) programmer knows how the pipelines of a particular processor work (and i do mean very well), they know that there are a couple of separate pipelines, one which handles e.g. NxFP32, one which handles MxFP64, but that if you issue SIMD instructions of width N=Mx2, it will result in a "blockage" (stall) and under-utilisation.

*however*... if you issue *half* the workload (i.e. MVL == W/2) for the FP32 instructions interleaved with "full" workload (MVL==W for the FP64 ops), *then*, because of the way that the architecture works the two suites of instructions *will* go to the separate pipelines, *will* get done in parallel, because you're not overloading the exact same 64-bit-wide pipeline entrypoint if you'd done... you get what i'm trying to say?

i think what i'm trying to say works better for MMX (the instructions which shared the FP regfile with SIMD instructions, is that right? or is it SSE?) - there you definitely want control over how much of the regfile is allocated to SIMD and how much remains actual for scalar-FP usage, and if MVL == W as a hard-coded assumption, with no "hint", you could end up taking up far more of the FP regfile for SIMD MMX than is efficient / effective.

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

In D57504#1854330, @simoll wrote:

But you need some such analysis, and must not hoist when those tricks all fail, because there's no general guarantee that the condition you're hoisting out of is independent from "%evl > element count?". A trivial (if pathological) example of this is when the condition never true in any execution and the EVL value is larger than W. A more real-world example, if you insist, comes from one proposed way to port hand-crafted fixed-width SIMD algorithms to RVV: check at runtime whether vector registers are at least as large as required by the SIMD algorithm, if so set the VL register to a constant and execute vector code,

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

this specifically *does not matter* on RVV (sigh: when RVV is not done on top of the FP regfile, and there is a separate vector regfile), because the vector regfile is specifically designed to refer to *vectors*... not to invididual elements.

for SimpleV, because we designed it right from the start to sit on top of the int and fp regfiles, what VL is set to *really does matter*, because it defines precisely and exactly how many of the scalar registers are to be used *as* "vector elements".

thus, for RVV, when converting SIMD assembly patterns to RVV, you absolutely *must* use the "loop pattern" described in https://www.sigarch.org/simd-instructions-considered-harmful/

if you try to hard-code-set VL to anything specific, this has the (unintended) side-effect of destroying the entire paradigm on which RVV is based, namely that you are not *supposed* to know the actual hardware vector "lane" size... at all. so, if you had really minimalist hardware which only *had* one actual "Lane", then if you tried to explicitly set VL=4, that hardware is absolutely hosed, as it is literally unable to support, at the hardware level, the three extra lanes requested/demanded.

this is why you have to "ask" for a VL, and the instruction will put the *actual* number of elements that VL got set to into a destination register, because you need to subtract that number of (processed) elements from the loop.

of course, with the idea of dropping RVV on top of the FP regfile that goes somewhat out the window. however i'm not... welcome, shall we say... in the RV WG participation, so you'd need to take this up with them, directly. and try not to mention my name too much because they're quite likely to sabotage things (to everyone's detriment) just because i was the one that came up with the insights. *shakes head*...

In D57504#1872392, @craig.topper wrote:

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

craig, thank you for correcting me. that makes a lot of sense as i can just imagine the x87 designers going "argh, how are we going to avoid a pipeline clash / mess, here" :)

you get the principle i am sure, even though MMX is not a suitable example.

In D57504#1872412, @lkcl wrote:

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

I don't know where you have gotten this idea, it has never been true for as long as I can recall. While RVV implementations have some freedom in how they set VL, there are also lots of rules governing their behavior. Most relevantly, since October 2018 (spec version 0.5-draft), programs requesting something less than or equal to the maximum VL will get exactly that number as VL, no something smaller. And even before that change, there were long-standing significant restrictions on how VL is determined beyond what you claim (see the linked commit).

Furthermore, even if what you said was true, it would not make the scheme I described invalid. VL does not change without the program deliberately executing one of a few instructions that change VL (this is already necessary for any strip-mined loop to work at all). Thus, after executing a SETVL it's enough to inspect the resulting VL to know whether it's safe to execute code that assumes a particular value of VL. More freedom in how VL is determined by the processor just means more possibilities for unnecessarily hitting the fallback path, but that only impacts performance rather than correctness.

In D57504#1876242, @rkruppe wrote:

In D57504#1872412, @lkcl wrote:

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

I don't know where you have gotten this idea, it has never been true for as long as I can recall. While RVV implementations have some freedom in how they set VL, there are also lots of rules governing their behavior. Most relevantly, since October 2018 (spec version 0.5-draft), programs requesting something less than or equal to the maximum VL will get exactly that number as VL, no something smaller. And even before that change, there were long-standing significant restrictions on how VL is determined beyond what you claim (see the linked commit).

remember, with the exclusion from discussion due to the anti-trust practices of the RISC-V Foundation, everyone on the "outside" of the RVV working group process has to "reverse-engineer" what the hell is going on. so please do be patient if i make mistakes, as i am not really very happy spending our sponsor's and donor's time (and money) extracting information from the RVV WG in this way (and shouldn't have to).

Furthermore, even if what you said was true, it would not make the scheme I described invalid.

if you are describing replacing a SIMD loop with a *single* instruction, prefixed with a "SETVL", then my understanding is that yes, it would be... *on some hardware*. if the intention is never to be fully-compatible with *all* RVV-compatible hardware, then that's fine.

think it through: imagine some hardware that has only one "lane". that hardware will ONLY have an *absolute* maximum value for MVL: one.

therefore, if you try to set VL to anything greater than 1, it will *only* permit VL to be set to 1.

the variable nature of MVL on a per-implementor basis has caused other problems as well, particularly in the element-offset (VSLIDE?) instructions. it's been a contentious issue.

VL does not change without the program deliberately executing one of a few instructions that change VL (this is already necessary for any strip-mined loop to work at all). Thus, after executing a SETVL it's enough to inspect the resulting VL to know whether it's safe to execute code that assumes a particular value of VL.

ahhh, okaay, right. i get it. so, you'd have:

SETVL a5, 4 # a5 is the dest reg where VL gets stored
if (a5 != 4)
{

go to fallback loop

}

More freedom in how VL is determined by the processor just means more possibilities for unnecessarily hitting the fallback path, but that only impacts performance rather than correctness.

i would argue that even the check itself - having the fallback path at all - impacts performance (and increases code size).

this is why, in SimpleV, we make it mandatory that even if the underlying hardware does not have a large number of lanes, the implementation *must* provide "virtual" hardware - in effect a hardware for-loop. one other processor which does exactly this is the Broadcomm VideoCore IV. it gives the *impression* of having a 16-wide FP32 SIMD capability, whereas in fact it only has a 4x FP32 operation and the hardware delays for 4 additional cycles, pushing 4 *sets* of 4x FP32 into the (one) 4-wide FP32 pipeline.

In D57504#1872493, @lkcl wrote:

In D57504#1872392, @craig.topper wrote:

MMX does use the X87 FP register file, but they can't coexist at the same. The first use of MMX marks the X87 register stack as occupied. I can't remember if it alters the data or not. An explicit emms instruction has to be done at the end of the MMX code to erase the MMX data and make the registers usable for X87 again.

craig, thank you for correcting me. that makes a lot of sense as i can just imagine the x87 designers going "argh, how are we going to avoid a pipeline clash / mess, here" :)

you get the principle i am sure, even though MMX is not a suitable example.

I don't know about Craig, but I'm not sure I do get the principle. For any given target we have a known maximum vector width (as in total number of bits, not number of elements) that is discoverable through TargetTransformInfo. We also have a "preferred" vector width that gets a default value based on the target architecture, but can be overridden by a command line option and may change what TargetTransformInfo tells you. However, the IR is not bound by these. The optimizer and any front end can generate whatever vectors they like. If some wacky optimization wants to create a <23 x float> vector, that's legal IR. However, when it gets to the backend, the type legalizer is going to do something to break it down into chunks that can be consumed by the processor. To get nicely optimized code, there needs to be cooperation between the optimizer and the backend.

This is why I mentioned before that the discussion of architecture specific details in the context of defining the semantics of the IR is making me nervous. LLVM IR is designed to be target-independent. The VP semantics need to respect that.

That's not to say we can ignore target-specific details. We have two distinct lanes though -- (1) the semantics of the IR, and (2) the mechanisms by which the target details can be discovered so that pre-codegen components can tune the IR for a specific target. We need to make sure the IR semantics are rich enough to represent the details of all targets we intend to support, but the details of the target shouldn't be visible in the IR semantics. Maybe I'm preaching to the choir here. I just want to make sure we're all on the same page. Perhaps this would be cleared up if I had a better understanding of what you were saying.

In D57504#1877261, @andrew.w.kaylor wrote:

Perhaps this would be cleared up if I had a better understanding of what you were saying.

appreciated. if it's ok, can we schedule that for when it's part of a (new) proposal?

In D57504#1877268, @lkcl wrote:

In D57504#1877261, @andrew.w.kaylor wrote:

Perhaps this would be cleared up if I had a better understanding of what you were saying.

appreciated. if it's ok, can we schedule that for when it's part of a (new) proposal?

Sure.

rebased
various fixes
includes llangref rephrasing and atest changes to VP integer patch

Harbormaster failed remote builds in B47153: Diff 246280!Feb 24 2020, 1:06 PM

HsiangKai added a subscriber: HsiangKai.Feb 25 2020, 11:44 PM

FYI, the test failures you are seeing here are due to the generalized pattern matching doing a better job at matching the fsub idiom for fneg. The required test changes are included in https://reviews.llvm.org/D75467 .

Herald added a subscriber: danielkiss. · View Herald TranscriptMar 10 2020, 5:18 AM

Rebased
%evl <= W or UB ensues
fixed LangRef wording

The VP integer intrinsic patch has been accepted. I'll wait until next week with the commit to leave time for comments.

FYI, the VP-integer intrinsics & langref patch is in. Next up: expansion to standard SIMD IR. I'll announce the next patch also on llvm-dev when it's on phabricator.

rkruppe mentioned this in D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.Mar 19 2020, 11:14 AM

huihuisun1991 added a subscriber: huihuisun1991.Mar 28 2020, 2:35 AM

evandro added a subscriber: evandro.Mar 31 2020, 1:49 PM

simoll added a child revision: D78203: [VP,Integer,#2] ExpandVectorPredication pass.Apr 15 2020, 6:54 AM

m_zuckerman added a reviewer: m_zuckerman.Jun 2 2020, 6:18 AM

TODO: adapt to changes of get.active.lane.mask.

Herald added a reviewer: jdoerfert. · View Herald TranscriptAug 28 2020, 5:16 AM

Herald added a reviewer: jdoerfert. · View Herald Transcript

Herald added subscribers: nikic, steven.zhang. · View Herald Transcript

venkataramanan.kumar.llvm added a subscriber: venkataramanan.kumar.llvm.Sep 9 2020, 10:37 AM

Herald added a subscriber: ecnelises. · View Herald TranscriptSep 9 2020, 10:37 AM

mdchen added a subscriber: mdchen.Oct 13 2020, 4:46 AM

Kazhuu added a subscriber: Kazhuu.Oct 19 2020, 7:16 AM

xmj added a subscriber: xmj.Oct 20 2020, 11:03 PM

frasercrmck added a subscriber: frasercrmck.Oct 22 2020, 3:29 AM

dnsampaio added a subscriber: dnsampaio.Oct 22 2020, 5:43 AM

simoll added a child revision: D91441: [VP] Build VP SDNodes.Nov 13 2020, 9:38 AM

simoll mentioned this in D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 3:01 AM

simoll added a child revision: D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 4:58 AM

simoll removed a child revision: D92086: Generalized PatternMatch & InstSimplify.Nov 25 2020, 5:42 AM

Hi @simoll: a quick question regarding vp.load/vp.store/vp.gather/vp.scatter. Does the current definition of VPred allow for something similar to the !nontemporal metadata of regular load/store instructions? I don't see any explicit mention to that but maybe it is already possible using metadata or some other annotation?

Thanks!

rkruppe removed a reviewer: rkruppe.Dec 2 2020, 9:08 AM

rkruppe removed a subscriber: rkruppe.

In D57504#2424884, @rogfer01 wrote:

Hi @simoll: a quick question regarding vp.load/vp.store/vp.gather/vp.scatter. Does the current definition of VPred allow for something similar to the !nontemporal metadata of regular load/store instructions? I don't see any explicit mention to that but maybe it is already possible using metadata or some other annotation?

First time i learn about !nontemporal metadata. I'd be absolutely in favor for supporting this also in VP mem ops!

@hussainjk I don't think we need to support non-temporal md hints right from the start (we can tag on md later) but it'd be great to have a vp.load/store patch with just the intrinsics on Phabricator to start discussions like this and make progress on VP mem ops.

khchen added a subscriber: khchen.Dec 4 2020, 7:13 AM

troyj added a subscriber: troyj.Jan 22 2021, 7:27 AM

hussainjk mentioned this in D99355: Implementation of intrinsic and SDNode definitions for VP load, store, gather, scatter..Mar 25 2021, 10:56 AM

FYI. There is a biweekly syncup call on VP (Tue, 3pm CET, next: 2021-05-11)

Minutes (with zoom link): https://docs.google.com/document/d/1q26ToudQjnqN5x31zk8zgq_s0lem1-BF8pQmciLa4k8/edit?usp=sharing

Contact me, if you want to join our Discord server.

simoll mentioned this in D93470: [VP] Binary floating-point intrinsics..Jun 7 2021, 5:40 AM

frasercrmck mentioned this in D104308: [VP] Add vector-predicated reduction intrinsics.Jun 15 2021, 9:41 AM

simoll mentioned this in rGd21a35ac0a95: [VP] Implementation of intrinsic and SDNode definitions for VP load, store….Jul 1 2021, 4:35 AM

NigelYu added a subscriber: NigelYu.Aug 12 2021, 12:19 PM

Herald added subscribers: ctetreau, ormris. · View Herald TranscriptAug 12 2021, 12:19 PM

frasercrmck mentioned this in rGf3e9047249d0: [VP] Add vector-predicated reduction intrinsics.Aug 17 2021, 10:06 AM

tim.schmielau added a subscriber: tim.schmielau.Apr 14 2022, 5:47 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 14 2022, 5:47 AM

Herald added subscribers: luke957, arichardson. · View Herald Transcript

liaolucy added a subscriber: liaolucy.May 24 2022, 7:34 PM

rui.zhang added a subscriber: rui.zhang.Sep 7 2022, 9:36 AM

Herald added subscribers: • pcwang-thead, nlopes. · View Herald TranscriptSep 7 2022, 9:36 AM

pshung added a subscriber: pshung.May 3 2023, 11:48 PM

Herald added subscribers: hoy, StephenFan. · View Herald TranscriptMay 3 2023, 11:48 PM

harishcse44 added a subscriber: harishcse44.May 23 2023, 1:21 AM

sunshaoce added a subscriber: sunshaoce.Aug 17 2023, 2:35 AM

Herald added a subscriber: wangpc. · View Herald TranscriptAug 17 2023, 2:35 AM

evandro removed a subscriber: evandro.Aug 17 2023, 5:08 PM

Revision Contents

Path

Size

docs/

LangRef.rst

17 lines

Proposals/

VectorPredication.rst

83 lines

include/

llvm/

Analysis/

InstructionSimplify.h

15 lines

Bitcode/

LLVMBitCodes.h

5 lines

CodeGen/

ISDOpcodes.h

51 lines

SelectionDAG.h

14 lines

SelectionDAGNodes.h

217 lines

IR/

9 lines

14 lines

210 lines

453 lines

65 lines

284 lines

369 lines

232 lines

Target/

TargetSelectionDAG.td

71 lines

lib/

Analysis/

InstructionSimplify.cpp

53 lines

AsmParser/

LLLexer.cpp

3 lines

LLParser.cpp

21 lines

LLToken.h

3 lines

Bitcode/

Reader/

BitcodeReader.cpp

6 lines

Writer/

BitcodeWriter.cpp

6 lines

CodeGen/

SelectionDAG/

DAGCombiner.cpp

214 lines

LegalizeIntegerTypes.cpp

25 lines

LegalizeTypes.h

2 lines

SelectionDAG.cpp

336 lines

SelectionDAGBuilder.h

6 lines

SelectionDAGBuilder.cpp

381 lines

SelectionDAGDumper.cpp

59 lines

SelectionDAGISel.cpp

4 lines

IR/

6 lines

5 lines

350 lines

61 lines

195 lines

124 lines

Transforms/

InstCombine/

InstCombineAddSub.cpp

108 lines

InstCombineCalls.cpp

12 lines

InstCombineInternal.h

13 lines

Utils/

CodeExtractor.cpp

3 lines

test/

Bitcode/

attributes.ll

5 lines

Transforms/

InstCombine/

vp-fsub.ll

43 lines

InstSimplify/

vp-fsub.ll

43 lines

Verifier/

evl_attribs.ll

13 lines

utils/

TableGen/

CodeGenIntrinsics.h

5 lines

CodeGenTarget.cpp

28 lines

IntrinsicEmitter.cpp

18 lines

Commit	Tree	Parents	Author	Summary	Date
78e12bb45dc9	aee259fed8e2	9f7bc826dd0e	Simon Moll	[VP], [ConstrainedFP] enum refactoring, constrained VP verifier	Apr 16 2019, 2:50 AM
9f7bc826dd0e	2eb4c994e562	9f55cfe3df3b	Simon Moll	[VP] constrained fp intrinsics	Apr 16 2019, 2:15 AM
9f55cfe3df3b	165ca2555b21	3132415872e8	Simon Moll	isReductionOp	Feb 12 2019, 6:05 AM
3132415872e8	cdcaa5d75ae6	750a3dd8d5e2	Simon Moll	langref: VP intrinsic stub	Feb 12 2019, 6:04 AM
750a3dd8d5e2	7c27f2bc5682	05542ed288c1	Simon Moll	wip: EVL -> VP	Feb 8 2019, 6:14 AM
05542ed288c1	5e2f912f1516	63d79338a614	Simon Moll	fixed typos	Feb 4 2019, 8:24 AM
63d79338a614	a2d25ba19db1	f887cc656710	Simon Moll	Vector Predication proposal	Jan 31 2019, 3:03 AM
f887cc656710	f991735f4f66	581ec741ed9c	Simon Moll	evl InstCombine test	Jan 29 2019, 8:59 AM
581ec741ed9c	5c7b177d921b	1ac6ffd59895	Simon Moll	fix: m_OneUse , InstCombine try_match	Jan 29 2019, 8:52 AM
1ac6ffd59895	e9b992d04ea1	99c8452ce147	Simon Moll	reset_match: restart match, try_match: match in given context	Jan 29 2019, 8:06 AM
99c8452ce147	e491b3e521af	c2ef98954713	Simon Moll	EVL InstSimplify test	Jan 29 2019, 8:06 AM
c2ef98954713	6726a3a8bea1	84c0448a8ce4	Simon Moll	wip: InstCombine..	Jan 29 2019, 7:32 AM
84c0448a8ce4	fb63b3cb37b7	58665cfba252	Simon Moll	wip: evl_fsub test	Jan 28 2019, 11:51 AM
58665cfba252	5a1038e2e3b4	5be85fe9a35c	Simon Moll	cleanup,moved PredicatedContext from PatternMatch	Jan 28 2019, 11:27 AM
5be85fe9a35c	4cd0a42cb859	2b7d1a284eb3	Simon Moll	pattern-match, type translation prototype	Jan 28 2019, 11:20 AM
2b7d1a284eb3	6d49f99bc104	fbb5bea2de3e	Simon Moll	InstSimplify: Predicated FSub	Jan 28 2019, 7:15 AM
fbb5bea2de3e	f2949a7fcf12	009945967d6e	Simon Moll	wip: predicated insts	Jan 28 2019, 5:23 AM
009945967d6e	e6b27007ed79	a78b1c6bd630	Simon Moll	wip: PatternMatch	Jan 28 2019, 4:36 AM
a78b1c6bd630	d853070f9ece	fc72cc63b3ee	Simon Moll	wip: PatternMatch with context	Jan 25 2019, 8:33 AM
fc72cc63b3ee	687d46343b08	8ebd931cbad1	Simon Moll	evl,evlbuilder,dagbuilder: use align attribute	Jan 25 2019, 6:42 AM
8ebd931cbad1	5137d5d38788	19fca409c057	Simon Moll	LLVMSameVectorWidth -> LLVMScalarOrSameVectorWidth	Jan 25 2019, 6:23 AM
19fca409c057	25e000c0fced	4d505fae5e34	Simon Moll	same arg order in select and evl_select	Jan 25 2019, 3:54 AM
4d505fae5e34	d6d9f3a1d031	d6599b082479	Simon Moll	fix evl_gather/evl_scatter generation	Jan 23 2019, 7:29 AM
d6599b082479	3dc2f68a5520	fef17ec780e4	Simon Moll	EVL: wip gather/scatter	Jan 22 2019, 10:20 AM
fef17ec780e4	ef509c5ebe80	45634f27bb89	Simon Moll	EVL: load/store	Jan 22 2019, 8:32 AM
45634f27bb89	2f1e1f215faf	cead10008a80	Simon Moll	DAGCombiner: templatized FMA fusion for EVL	Jan 22 2019, 7:18 AM
cead10008a80	4e901fcc9089	54f346b407d0	Simon Moll	ISD: EVL helper functions	Jan 22 2019, 7:18 AM
54f346b407d0	799614bc83a7	7f9e95bc9004	Simon Moll	evl: normalized reduce intrinsics	Jan 22 2019, 7:16 AM
7f9e95bc9004	6e74b4ea1e1a	4fa58d133fbe	Simon Moll	fix: dont count LLVMMatchType as OverloadedVT	Jan 22 2019, 7:15 AM
4fa58d133fbe	53969996a717	d66a793c361e	Simon Moll	EVL: fixed arithmetic sdnode lowering	Jan 21 2019, 8:59 AM
d66a793c361e	3b355e11027f	1d818979d989	Simon Moll	EVL: infer i/fcmp type from first operand	Jan 21 2019, 8:35 AM
1d818979d989	a304c71f54b5	46e828070082	Simon Moll	EVL: DAGMatchContext in DAGCombiner, PromoteIntOp for mask	Jan 21 2019, 8:02 AM
46e828070082	daa3e4bd140d	80b52b9167ba	Simon Moll	synced EVL intirnsic list	Jan 21 2019, 5:38 AM
80b52b9167ba	5f091bfbcbf4	cc08ea7f75e2	Simon Moll	no passthru	Jan 21 2019, 4:10 AM
cc08ea7f75e2	8d128d85eb53	04485713f442	Simon Moll	EVL: cmp i8 predicate	Jan 21 2019, 2:39 AM
04485713f442	4dd744eaa6d8	f33223a246bf	Simon Moll	evl: fcmp, icmp, various fixes	Jan 21 2019, 1:23 AM
f33223a246bf	ca1498d0fd9e	1b3398612f34	Simon Moll	[EVL] Explicit Vector Length attributes (mask, vlen, passthru) (Show More…)	Oct 23 2018, 1:56 AM

Diff 195366

docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 13,973 Lines • ▼ Show 20 Lines
	""""""""""			""""""""""

	On some architectures the address of the code to be executed needs to be			On some architectures the address of the code to be executed needs to be
	different than the address where the trampoline is actually stored. This			different than the address where the trampoline is actually stored. This
	intrinsic returns the executable address corresponding to ``tramp``			intrinsic returns the executable address corresponding to ``tramp``
	after performing the required machine specific adjustments. The pointer			after performing the required machine specific adjustments. The pointer
	returned can then be :ref:`bitcast and executed <int_trampoline>`.			returned can then be :ref:`bitcast and executed <int_trampoline>`.


				.. _int_vp:

				Vector Predication Intrinsics
				----------------------------
				VP intrinics are intended for predicated SIMD/vector code.
				A typical VP operation takes a mask (<W x i1>) and an explicit vector length parameter (i32) as in:

				<W x T> llvm.vp.<opcode>.*(<W x T> %x, <W x T> %y, <W x i1> mask %Mask, i32 vlen %evl)

				The mask and explicit vector length parameter are unambiguously identified by the mask and vlen parameter attributes.
				Result elements are only computed for enabled lanes.
				The explicit vector length parameter only disables lane if the MSB of the parameter is zero.
				A lane is enabled if the mask at that position is true and, if effective, where the lane position is below the explicit vector length.

				In case of purely vertical operations (SIMD binary operators, etc) the result is undef on disabled lanes.

	.. _int_mload_mstore:			.. _int_mload_mstore:

	Masked Vector Load and Store Intrinsics			Masked Vector Load and Store Intrinsics
	---------------------------------------			---------------------------------------

	LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.			LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.

	.. _int_mload:			.. _int_mload:
	▲ Show 20 Lines • Show All 2,880 Lines • Show Last 20 Lines

docs/Proposals/VectorPredication.rst

This file was added.

				==========================
				Vector Predication Roadmap
				==========================

				.. contents:: Table of Contents
				:depth: 3
				:local:

				Motivation
				==========

				This proposal defines a roadmap towards native vector predication in LLVM, specifically for vector instructions with a mask and/or an explicit vector length.
				LLVM currently has no target-independent means to model predicated vector instructions for modern SIMD ISAs such as AVX512, ARM SVE, the RISC-V V extension and NEC SX-Aurora.
				Only some predicated vector operations, such as masked loads and stores are available through intrinsics [MaskedIR]_.

				The Explicit Vector Length extension
				====================================

				The Explicit Vector Length (EVL) extension [EvlRFC]_ can be a first step towards native vector predication.
				The EVL prototype in this patch demonstrates the following concepts:
				greenedUnsubmitted Not Done Reply Inline Actions This document seems to overload "EVL" to mean both predication and an explicit vector length. Of course predication can be used to simulate the effects of a vector length value on targets that don't have a way to specify an explicit vector length. Can we keep the two concepts distinct in this document? It's confusing to see "EVL" when discussing predication. In particular, everything in `SelectionDAG` uses the "EVL" name to represent both concepts. Can we come up with something more accurate? greened: This document seems to overload "EVL" to mean both predication and an explicit vector length.
				simollAuthorUnsubmitted Done Reply Inline Actions "EVL" is really just the working title of the whole extension. It is my understanding that the explicit vector length parameter is just part of the predicate and not dinstinct from it (eg the conceptual predicate is a composite of the bit mask and the vector length). How about naming the extension "VP" for "vector predication" (vp_fadd, llvm.vp.)..? simoll:* "EVL" is really just the working title of the whole extension. It is my understanding that the…
				greenedUnsubmitted Not Done Reply Inline Actions Sure, that sounds fine. greened: Sure, that sounds fine.

				- Predicated vector intrinsics with an explicit mask and vector length parameter on IR level.
				- First-class predicated SDNodes on ISel level. Mask and vector length are value operands.
				- An incremental strategy to generalize PatternMatch/InstCombine/InstSimplify and DAGCombiner to work on both regular instructions and EVL intrinsics.
				- DAGCombiner example: FMA fusion.
				- InstCombine/InstSimplify example: FSub pattern re-writes.
				- Early experiments on the LNT test suite (Clang static release, O3 -ffast-math) indicate that compile time on non-EVL IR is not affected by the API abstractions in PatternMatch, etc.

				Roadmap
				=======

				Drawing from the EVL prototype, we propose the following roadmap towards native vector predication in LLVM:


				1. IR-level EVL intrinsics
				-----------------------------------------

				- There is a consensus on the semantics/instruction set of EVL.
				- EVL intrinsics and attributes are available on IR level.
				- TTI has capability flags for EVL (``supportsEVL()``?, ``haveActiveVectorLength()``?).

				Result: EVL usable for IR-level vectorizers (LV, VPlan, RegionVectorizer), potential integration in Clang with builtins.

				2. CodeGen support
				------------------

				- EVL intrinsics translate to first-class SDNodes (``llvm.evl.fdiv.* -> evl_fdiv``).
				- EVL legalization (legalize explicit vector length to mask (AVX512), legalize EVL SDNodes to pre-existing ones (SSE, NEON)).

				Result: Backend development based on EVL SDNodes.

				3. Lift InstSimplify/InstCombine/DAGCombiner to EVL
				------------------------------------------------

				- Introduce PredicatedInstruction, PredicatedBinaryOperator, .. helper classes that match standard vector IR and EVL intrinsics.
				- Add a matcher context to PatternMatch and context-aware IR Builder APIs.
				- Incrementally lift DAGCombiner to work on EVL SDNodes as well as on regular vector instructions.
				- Incrementally lift InstCombine/InstSimplify to operate on EVL as well as regular IR instructions.

				Result: Optimization of EVL intrinsics on par with standard vector instructions.

				4. Deprecate llvm.masked.* / llvm.experimental.reduce.*
				-------------------------------------------------------

				- Modernize llvm.masked.* / llvm.experimental.reduce* by translating to EVL.
				- DCE transitional APIs.

				Result: EVL has superseded earlier vector intrinsics.

				5. Predicated IR Instructions
				---------------------------------------

				- Vector instructions have an optional mask and vector length parameter. These lower to EVL SDNodes (from Stage 2).
				- Phase out EVL intrinsics, only keeping those that are not equivalent to vectorized scalar instructions (reduce, shuffles, ..)
				- InstCombine/InstSimplify expect predication in regular Instructions (Stage (3) has laid the groundwork).

				Result: Native vector predication in IR.

				References
				==========

				.. [MaskedIR] `llvm.masked.*` intrinsics, https://llvm.org/docs/LangRef.html#masked-vector-load-and-store-intrinsics
				.. [EvlRFC] Explicit Vector Length RFC, https://reviews.llvm.org/D53613

include/llvm/Analysis/InstructionSimplify.h

	Show First 20 Lines • Show All 46 Lines • ▼ Show 20 Lines
	struct LoopStandardAnalysisResults;			struct LoopStandardAnalysisResults;
	class OptimizationRemarkEmitter;			class OptimizationRemarkEmitter;
	class Pass;			class Pass;
	class TargetLibraryInfo;			class TargetLibraryInfo;
	class Type;			class Type;
	class Value;			class Value;
	class MDNode;			class MDNode;
	class BinaryOperator;			class BinaryOperator;
				class VPIntrinsic;
				namespace PatternMatch {
				struct PredicatedContext;
				}

	/// InstrInfoQuery provides an interface to query additional information for			/// InstrInfoQuery provides an interface to query additional information for
	/// instructions like metadata or keywords like nsw, which provides conservative			/// instructions like metadata or keywords like nsw, which provides conservative
	/// results if the users specified it is safe to use.			/// results if the users specified it is safe to use.
	struct InstrInfoQuery {			struct InstrInfoQuery {
	InstrInfoQuery(bool UMD) : UseInstrInfo(UMD) {}			InstrInfoQuery(bool UMD) : UseInstrInfo(UMD) {}
	InstrInfoQuery() : UseInstrInfo(true) {}			InstrInfoQuery() : UseInstrInfo(true) {}
	bool UseInstrInfo = true;			bool UseInstrInfo = true;
	▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	/// Given operands for an FAdd, fold the result or return null.			/// Given operands for an FAdd, fold the result or return null.
	Value SimplifyFAddInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFAddInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

	/// Given operands for an FSub, fold the result or return null.			/// Given operands for an FSub, fold the result or return null.
	Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

				/// Given operands for an FSub, fold the result or return null.
				Value SimplifyFSubInst(Value LHS, Value *RHS, FastMathFlags FMF,
				const SimplifyQuery &Q);
				Value SimplifyPredicatedFSubInst(Value LHS, Value *RHS,
				FastMathFlags FMF, const SimplifyQuery &Q,
				PatternMatch::PredicatedContext & PC);

	/// Given operands for an FMul, fold the result or return null.			/// Given operands for an FMul, fold the result or return null.
	Value SimplifyFMulInst(Value LHS, Value *RHS, FastMathFlags FMF,			Value SimplifyFMulInst(Value LHS, Value *RHS, FastMathFlags FMF,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

	/// Given operands for a Mul, fold the result or return null.			/// Given operands for a Mul, fold the result or return null.
	Value SimplifyMulInst(Value LHS, Value *RHS, const SimplifyQuery &Q);			Value SimplifyMulInst(Value LHS, Value *RHS, const SimplifyQuery &Q);

	/// Given operands for an SDiv, fold the result or return null.			/// Given operands for an SDiv, fold the result or return null.
	▲ Show 20 Lines • Show All 96 Lines • ▼ Show 20 Lines
	/// Given a callsite, fold the result or return null.			/// Given a callsite, fold the result or return null.
	Value SimplifyCall(CallBase Call, const SimplifyQuery &Q);			Value SimplifyCall(CallBase Call, const SimplifyQuery &Q);

	/// Given a function and iterators over arguments, fold the result or return			/// Given a function and iterators over arguments, fold the result or return
	/// null.			/// null.
	Value SimplifyCall(CallBase Call, Value *V, User::op_iterator ArgBegin,			Value SimplifyCall(CallBase Call, Value *V, User::op_iterator ArgBegin,
	User::op_iterator ArgEnd, const SimplifyQuery &Q);			User::op_iterator ArgEnd, const SimplifyQuery &Q);

				/// Given a function and iterators over arguments, fold the result or return
				/// null.
				Value *SimplifyVPIntrinsic(VPIntrinsic & VPInst, const SimplifyQuery &Q);

	/// Given a function and set of arguments, fold the result or return null.			/// Given a function and set of arguments, fold the result or return null.
	Value SimplifyCall(CallBase Call, Value V, ArrayRef<Value > Args,			Value SimplifyCall(CallBase Call, Value V, ArrayRef<Value > Args,
	const SimplifyQuery &Q);			const SimplifyQuery &Q);

	/// See if we can compute a simplified version of this instruction. If not,			/// See if we can compute a simplified version of this instruction. If not,
	/// return null.			/// return null.
	Value SimplifyInstruction(Instruction I, const SimplifyQuery &Q,			Value SimplifyInstruction(Instruction I, const SimplifyQuery &Q,
	OptimizationRemarkEmitter *ORE = nullptr);			OptimizationRemarkEmitter *ORE = nullptr);
	Show All 37 Lines

include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 600 Lines • ▼ Show 20 Lines	enum AttributeKindCodes {
ATTR_KIND_WRITEONLY = 52,		ATTR_KIND_WRITEONLY = 52,
ATTR_KIND_SPECULATABLE = 53,		ATTR_KIND_SPECULATABLE = 53,
ATTR_KIND_STRICT_FP = 54,		ATTR_KIND_STRICT_FP = 54,
ATTR_KIND_SANITIZE_HWADDRESS = 55,		ATTR_KIND_SANITIZE_HWADDRESS = 55,
ATTR_KIND_NOCF_CHECK = 56,		ATTR_KIND_NOCF_CHECK = 56,
ATTR_KIND_OPT_FOR_FUZZING = 57,		ATTR_KIND_OPT_FOR_FUZZING = 57,
ATTR_KIND_SHADOWCALLSTACK = 58,		ATTR_KIND_SHADOWCALLSTACK = 58,
ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,		ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,
ATTR_KIND_IMMARG = 60		ATTR_KIND_IMMARG = 60,
		ATTR_KIND_MASK = 61,
		ATTR_KIND_VECTORLENGTH = 62,
		ATTR_KIND_PASSTHRU = 63,
};		};

enum ComdatSelectionKindCodes {		enum ComdatSelectionKindCodes {
COMDAT_SELECTION_KIND_ANY = 1,		COMDAT_SELECTION_KIND_ANY = 1,
COMDAT_SELECTION_KIND_EXACT_MATCH = 2,		COMDAT_SELECTION_KIND_EXACT_MATCH = 2,
COMDAT_SELECTION_KIND_LARGEST = 3,		COMDAT_SELECTION_KIND_LARGEST = 3,
COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,		COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,
COMDAT_SELECTION_KIND_SAME_SIZE = 5,		COMDAT_SELECTION_KIND_SAME_SIZE = 5,
Show All 14 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	enum NodeType {
/// them all as its individual results. This nodes has exactly the same		/// them all as its individual results. This nodes has exactly the same
/// number of inputs and outputs. This node is useful for some pieces of the		/// number of inputs and outputs. This node is useful for some pieces of the
/// code generator that want to think about a single node with multiple		/// code generator that want to think about a single node with multiple
/// results, not multiple nodes.		/// results, not multiple nodes.
MERGE_VALUES,		MERGE_VALUES,

/// Simple integer binary arithmetic operators.		/// Simple integer binary arithmetic operators.
ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,		ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,
		VP_ADD, VP_SUB, VP_MUL, VP_SDIV, VP_UDIV, VP_SREM, VP_UREM,

/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing		/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing
/// a signed/unsigned value of type i[2*N], and return the full value as		/// a signed/unsigned value of type i[2*N], and return the full value as
/// two results, each of type iN.		/// two results, each of type iN.
SMUL_LOHI, UMUL_LOHI,		SMUL_LOHI, UMUL_LOHI,

/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and		/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and
/// remainder result.		/// remainder result.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	enum NodeType {
/// 2 integers with the same width and scale. SCALE represents the scale of		/// 2 integers with the same width and scale. SCALE represents the scale of
/// both operands as fixed point numbers. This SCALE parameter must be a		/// both operands as fixed point numbers. This SCALE parameter must be a
/// constant integer. A scale of zero is effectively performing		/// constant integer. A scale of zero is effectively performing
/// multiplication on 2 integers.		/// multiplication on 2 integers.
SMULFIX, UMULFIX,		SMULFIX, UMULFIX,

/// Simple binary floating point operators.		/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,		FADD, FSUB, FMUL, FDIV, FREM,
		VP_FADD, VP_FSUB, VP_FMUL, VP_FDIV, VP_FREM,

/// Constrained versions of the binary floating point operators.		/// Constrained versions of the binary floating point operators.
/// These will be lowered to the simple operators before final selection.		/// These will be lowered to the simple operators before final selection.
/// They are used to limit optimizations while the DAG is being		/// They are used to limit optimizations while the DAG is being
/// optimized.		/// optimized.
STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,		STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
STRICT_FMA,		STRICT_FMA,

/// Constrained versions of libm-equivalent floating point intrinsics.		/// Constrained versions of libm-equivalent floating point intrinsics.
/// These will be lowered to the equivalent non-constrained pseudo-op		/// These will be lowered to the equivalent non-constrained pseudo-op
/// (or expanded to the equivalent library call) before final selection.		/// (or expanded to the equivalent library call) before final selection.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,		STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,
STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,		STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,
STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,		STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,
STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,		STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,

/// FMA - Perform a * b + c with no intermediate rounding step.		/// FMA - Perform a * b + c with no intermediate rounding step.
FMA,		FMA,
		VP_FMA,

/// FMAD - Perform a * b + c, while getting the same result as the		/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.		/// separately rounded operations.
FMAD,		FMAD,

/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This		/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This
/// DAG node does not require that X and Y have the same type, just that		/// DAG node does not require that X and Y have the same type, just that
/// they are both floating point. X and the result must have the same type.		/// they are both floating point. X and the result must have the same type.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	enum NodeType {
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,

		/// VP_VSHIFT(VEC1, AMOUNT, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1. AMOUNT is an integer value. The returned vector is equivalent
		/// to VEC1 shifted by AMOUNT (RETURNED_VEC[idx] = VEC1[idx + AMOUNT]).
		VP_VSHIFT,

		/// VP_COMPRESS(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		VP_COMPRESS,

		/// VP_EXPAND(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		VP_EXPAND,

/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a		/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top		/// scalar value into element 0 of the resultant vector type. The top
/// elements 1 to N-1 of the N-element vector are undefined. The type		/// elements 1 to N-1 of the N-element vector are undefined. The type
/// of the operand must match the vector element type, except when they		/// of the operand must match the vector element type, except when they
/// are integer types. In this case the operand is allowed to be wider		/// are integer types. In this case the operand is allowed to be wider
/// than the vector element type, and is implicitly truncated to it.		/// than the vector element type, and is implicitly truncated to it.
SCALAR_TO_VECTOR,		SCALAR_TO_VECTOR,

/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,		/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,
/// producing an unsigned/signed value of type i[2*N], then return the top		/// producing an unsigned/signed value of type i[2*N], then return the top
/// part.		/// part.
MULHU, MULHS,		MULHU, MULHS,

/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned		/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned
/// integers.		/// integers.
SMIN, SMAX, UMIN, UMAX,		SMIN, SMAX, UMIN, UMAX,

/// Bitwise operators - logical and, logical or, logical xor.		/// Bitwise operators - logical and, logical or, logical xor.
AND, OR, XOR,		AND, OR, XOR,
		VP_AND, VP_OR, VP_XOR,

/// ABS - Determine the unsigned absolute value of a signed integer value of		/// ABS - Determine the unsigned absolute value of a signed integer value of
/// the same bitwidth.		/// the same bitwidth.
/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow		/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow
/// is performed.		/// is performed.
ABS,		ABS,

/// Shift and rotation operations. After legalization, the type of the		/// Shift and rotation operations. After legalization, the type of the
/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization		/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization
/// the shift amount can be any type, but care must be taken to ensure it is		/// the shift amount can be any type, but care must be taken to ensure it is
/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before		/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before
/// legalization, types like i1024 can occur and i8 doesn't have enough bits		/// legalization, types like i1024 can occur and i8 doesn't have enough bits
/// to represent the shift amount.		/// to represent the shift amount.
/// When the 1st operand is a vector, the shift amount must be in the same		/// When the 1st operand is a vector, the shift amount must be in the same
/// type. (TLI.getShiftAmountTy() will return the same type when the input		/// type. (TLI.getShiftAmountTy() will return the same type when the input
/// type is a vector.)		/// type is a vector.)
/// For rotates and funnel shifts, the shift amount is treated as an unsigned		/// For rotates and funnel shifts, the shift amount is treated as an unsigned
/// amount modulo the element size of the first operand.		/// amount modulo the element size of the first operand.
///		///
/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.		/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.
/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))		/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))
/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))		/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))
SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,		SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,
		VP_SHL, VP_SRA, VP_SRL,

/// Byte Swap and Counting operators.		/// Byte Swap and Counting operators.
BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,		BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,

/// Bit counting operators with an undefined result for zero inputs.		/// Bit counting operators with an undefined result for zero inputs.
CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,		CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,

/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not		/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not
/// i1 then the high bits must conform to getBooleanContents.		/// i1 then the high bits must conform to getBooleanContents.
SELECT,		SELECT,

/// Select with a vector condition (op #0) and two vector operands (ops #1		/// Select with a vector condition (op #0) and two vector operands (ops #1
/// and #2), returning a vector result. All vectors have the same length.		/// and #2), returning a vector result. All vectors have the same length.
/// Much like the scalar select and setcc, each bit in the condition selects		/// Much like the scalar select and setcc, each bit in the condition selects
/// whether the corresponding result element is taken from op #1 or op #2.		/// whether the corresponding result element is taken from op #1 or op #2.
/// At first, the VSELECT condition is of vXi1 type. Later, targets may		/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a		/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.		/// pattern. The condition follows the BooleanContent format of the target.
VSELECT,		VSELECT,
		VP_SELECT,

		/// Select with an integer pivot (op #0) and two vector operands (ops #1
		/// and #2), returning a vector result. All vectors have the same length.
		/// Similar to the vector select, a comparison of the results element index
		/// with the integer pivot selects hether the corresponding result element
		/// is taken from op #1 or op #2.
		VP_COMPOSE,

/// Select with condition operator - This selects between a true value and		/// Select with condition operator - This selects between a true value and
/// a false value (ops #2 and #3) based on the boolean result of comparing		/// a false value (ops #2 and #3) based on the boolean result of comparing
/// the lhs and rhs (ops #0 and #1) of a conditional expression with the		/// the lhs and rhs (ops #0 and #1) of a conditional expression with the
/// condition code in op #4, a CondCodeSDNode.		/// condition code in op #4, a CondCodeSDNode.
SELECT_CC,		SELECT_CC,

/// SetCC operator - This evaluates to a true value iff the condition is		/// SetCC operator - This evaluates to a true value iff the condition is
/// true. If the result value type is not i1 then the high bits conform		/// true. If the result value type is not i1 then the high bits conform
/// to getBooleanContents. The operands to this are the left and right		/// to getBooleanContents. The operands to this are the left and right
/// operands to compare (ops #0, and #1) and the condition code to compare		/// operands to compare (ops #0, and #1) and the condition code to compare
/// them with (op #2) as a CondCodeSDNode. If the operands are vector types		/// them with (op #2) as a CondCodeSDNode. If the operands are vector types
/// then the result type must also be a vector type.		/// then the result type must also be a vector type.
SETCC,		SETCC,
		VP_SETCC,

/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but		/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but
/// op #2 is a boolean indicating if there is an incoming carry. This		/// op #2 is a boolean indicating if there is an incoming carry. This
/// operator checks the result of "LHS - RHS - Carry", and can be used to		/// operator checks the result of "LHS - RHS - Carry", and can be used to
/// compare two wide integers:		/// compare two wide integers:
/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).		/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).
/// Only valid for integers.		/// Only valid for integers.
SETCCCARRY,		SETCCCARRY,
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	enum NodeType {
FP16_TO_FP, FP_TO_FP16,		FP16_TO_FP, FP_TO_FP16,

/// Perform various unary floating-point operations inspired by libm. For		/// Perform various unary floating-point operations inspired by libm. For
/// FPOWI, the result is undefined if if the integer operand doesn't fit		/// FPOWI, the result is undefined if if the integer operand doesn't fit
/// into 32 bits.		/// into 32 bits.
FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,		FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,
FLOG, FLOG2, FLOG10, FEXP, FEXP2,		FLOG, FLOG2, FLOG10, FEXP, FEXP2,
FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,		FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,
		VP_FNEG,
/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two		/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two
/// values.		/// values.
//		//
/// In the case where a single input is a NaN (either signaling or quiet),		/// In the case where a single input is a NaN (either signaling or quiet),
/// the non-NaN input is returned.		/// the non-NaN input is returned.
///		///
/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.		/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.
FMINNUM, FMAXNUM,		FMINNUM, FMAXNUM,
▲ Show 20 Lines • Show All 232 Lines • ▼ Show 20 Lines	enum NodeType {

// Masked load and store - consecutive vector load and store operations		// Masked load and store - consecutive vector load and store operations
// with additional mask operand that prevents memory accesses to the		// with additional mask operand that prevents memory accesses to the
// masked-off lanes.		// masked-off lanes.
//		//
// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)		// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)
// OutChain = MSTORE(Value, BasePtr, Mask)		// OutChain = MSTORE(Value, BasePtr, Mask)
MLOAD, MSTORE,		MLOAD, MSTORE,
		VP_LOAD, VP_STORE,

// Masked gather and scatter - load and store operations for a vector of		// Masked gather and scatter - load and store operations for a vector of
// random addresses with additional mask operand that prevents memory		// random addresses with additional mask operand that prevents memory
// accesses to the masked-off lanes.		// accesses to the masked-off lanes.
//		//
// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)		// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)
// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)		// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)
//		//
// The Index operand can have more vector elements than the other operands		// The Index operand can have more vector elements than the other operands
// due to type legalization. The extra elements are ignored.		// due to type legalization. The extra elements are ignored.
MGATHER, MSCATTER,		MGATHER, MSCATTER,
		VP_GATHER, VP_SCATTER,

/// This corresponds to the llvm.lifetime.* intrinsics. The first operand		/// This corresponds to the llvm.lifetime.* intrinsics. The first operand
/// is the chain and the second operand is the alloca pointer.		/// is the chain and the second operand is the alloca pointer.
LIFETIME_START, LIFETIME_END,		LIFETIME_START, LIFETIME_END,

/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the		/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the
/// beginning and end of GC transition sequence, and carry arbitrary		/// beginning and end of GC transition sequence, and carry arbitrary
/// information that target might need for lowering. The first operand is		/// information that target might need for lowering. The first operand is
Show All 21 Lines	enum NodeType {
VECREDUCE_FMAX, VECREDUCE_FMIN,		VECREDUCE_FMAX, VECREDUCE_FMIN,
/// Integer reductions may have a result type larger than the vector element		/// Integer reductions may have a result type larger than the vector element
/// type. However, the reduction is performed using the vector element type		/// type. However, the reduction is performed using the vector element type
/// and the value in the top bits is unspecified.		/// and the value in the top bits is unspecified.
VECREDUCE_ADD, VECREDUCE_MUL,		VECREDUCE_ADD, VECREDUCE_MUL,
VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,		VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,		VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,

		VP_REDUCE_FADD, VP_REDUCE_FMUL,
		VP_REDUCE_ADD, VP_REDUCE_MUL,
		VP_REDUCE_AND, VP_REDUCE_OR, VP_REDUCE_XOR,
		VP_REDUCE_SMAX, VP_REDUCE_SMIN, VP_REDUCE_UMAX, VP_REDUCE_UMIN,

		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		VP_REDUCE_FMAX, VP_REDUCE_FMIN,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};

/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations		/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations
/// which do not reference a specific memory location should be less than		/// which do not reference a specific memory location should be less than
/// this value. Those that do must not be less than this value, and can		/// this value. Those that do must not be less than this value, and can
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);

/// Return the result of a logical AND between different comparisons of		/// Return the result of a logical AND between different comparisons of
/// identical values: ((X op1 Y) & (X op2 Y)). This function returns		/// identical values: ((X op1 Y) & (X op2 Y)). This function returns
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);

		/// Return the mask operand of this VP SDNode.
		/// Otw, return -1.
		int GetMaskPosVP(unsigned OpCode);

		/// Return the vector length operand of this VP SDNode.
		/// Otw, return -1.
		int GetVectorLengthPosVP(unsigned OpCode);

		/// Translate this VP OpCode to a native instruction OpCode.
		unsigned GetFunctionOpCodeForVP(unsigned VPOpCode);

		unsigned GetVPForFunctionOpCode(unsigned OpCode);

} // end llvm::ISD namespace		} // end llvm::ISD namespace

} // end llvm namespace		} // end llvm namespace

#endif		#endif

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 1,084 Lines • ▼ Show 20 Lines	getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,		MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
const AAMDNodes &AAInfo = AAMDNodes());		const AAMDNodes &AAInfo = AAMDNodes());
SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, EVT SVT, MachineMemOperand *MMO);		SDValue Ptr, EVT SVT, MachineMemOperand *MMO);
SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,		SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,
SDValue Offset, ISD::MemIndexedMode AM);		SDValue Offset, ISD::MemIndexedMode AM);

/// Returns sum of the base pointer and offset.		/// Returns sum of the base pointer and offset.
		SDValue getLoadVP(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
		SDValue Mask, SDValue VLen, EVT MemVT,
		MachineMemOperand *MMO, ISD::LoadExtType);

		SDValue getStoreVP(SDValue Chain, const SDLoc &dl, SDValue Val,
		SDValue Ptr, SDValue Mask, SDValue VLen,
		EVT MemVT, MachineMemOperand *MMO,
		bool IsTruncating = false);
		SDValue getGatherVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO);
		SDValue getScatterVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO);

		/// Returns sum of the base pointer and offset.
SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);		SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);

SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,		SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
SDValue Mask, SDValue Src0, EVT MemVT,		SDValue Mask, SDValue Src0, EVT MemVT,
MachineMemOperand *MMO, ISD::LoadExtType,		MachineMemOperand *MMO, ISD::LoadExtType,
bool IsExpanding = false);		bool IsExpanding = false);
SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, SDValue Mask, EVT MemVT,		SDValue Ptr, SDValue Mask, EVT MemVT,
▲ Show 20 Lines • Show All 613 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 539 Lines • ▼ Show 20 Lines	class LSBaseSDNodeBitfields {

uint16_t AddressingMode : 3; // enum ISD::MemIndexedMode		uint16_t AddressingMode : 3; // enum ISD::MemIndexedMode
};		};
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };		enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };

class LoadSDNodeBitfields {		class LoadSDNodeBitfields {
friend class LoadSDNode;		friend class LoadSDNode;
friend class MaskedLoadSDNode;		friend class MaskedLoadSDNode;
		friend class VPLoadSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t ExtTy : 2; // enum ISD::LoadExtType		uint16_t ExtTy : 2; // enum ISD::LoadExtType
uint16_t IsExpanding : 1;		uint16_t IsExpanding : 1;
};		};

class StoreSDNodeBitfields {		class StoreSDNodeBitfields {
friend class StoreSDNode;		friend class StoreSDNode;
friend class MaskedStoreSDNode;		friend class MaskedStoreSDNode;
		friend class VPStoreSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t IsTruncating : 1;		uint16_t IsTruncating : 1;
uint16_t IsCompressing : 1;		uint16_t IsCompressing : 1;
};		};

union {		union {
▲ Show 20 Lines • Show All 124 Lines • ▼ Show 20 Lines	switch (NodeType) {
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
return true;		return true;
}		}
}		}

		/// Test whether this is an Explicit Vector Length node.
		bool isVP() const {
		switch (NodeType) {
		default:
		return false;
		case ISD::VP_LOAD:
		case ISD::VP_STORE:
		case ISD::VP_GATHER:
		case ISD::VP_SCATTER:

		case ISD::VP_FNEG:

		case ISD::VP_FADD:
		case ISD::VP_FMUL:
		case ISD::VP_FSUB:
		case ISD::VP_FDIV:
		case ISD::VP_FREM:

		case ISD::VP_FMA:

		case ISD::VP_ADD:
		case ISD::VP_MUL:
		case ISD::VP_SUB:
		case ISD::VP_SRA:
		case ISD::VP_SRL:
		case ISD::VP_SHL:
		case ISD::VP_UDIV:
		case ISD::VP_SDIV:
		case ISD::VP_UREM:
		case ISD::VP_SREM:

		case ISD::VP_EXPAND:
		case ISD::VP_COMPRESS:
		case ISD::VP_VSHIFT:
		case ISD::VP_SETCC:
		case ISD::VP_COMPOSE:

		case ISD::VP_AND:
		case ISD::VP_XOR:
		case ISD::VP_OR:

		case ISD::VP_REDUCE_ADD:
		case ISD::VP_REDUCE_SMIN:
		case ISD::VP_REDUCE_SMAX:
		case ISD::VP_REDUCE_UMIN:
		case ISD::VP_REDUCE_UMAX:

		case ISD::VP_REDUCE_MUL:
		case ISD::VP_REDUCE_AND:
		case ISD::VP_REDUCE_OR:
		case ISD::VP_REDUCE_FADD:
		case ISD::VP_REDUCE_FMUL:
		case ISD::VP_REDUCE_FMIN:
		case ISD::VP_REDUCE_FMAX:

		return true;
		}
		}


/// Test if this node has a post-isel opcode, directly		/// Test if this node has a post-isel opcode, directly
/// corresponding to a MachineInstr opcode.		/// corresponding to a MachineInstr opcode.
bool isMachineOpcode() const { return NodeType < 0; }		bool isMachineOpcode() const { return NodeType < 0; }

/// This may only be called if isMachineOpcode returns		/// This may only be called if isMachineOpcode returns
/// true. It returns the MachineInstr opcode value that the node's opcode		/// true. It returns the MachineInstr opcode value that the node's opcode
/// corresponds to.		/// corresponds to.
unsigned getMachineOpcode() const {		unsigned getMachineOpcode() const {
▲ Show 20 Lines • Show All 678 Lines • ▼ Show 20 Lines	return N->getOpcode() == ISD::LOAD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|
N->getOpcode() == ISD::ATOMIC_LOAD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD \|\|
N->getOpcode() == ISD::ATOMIC_STORE \|\|		N->getOpcode() == ISD::ATOMIC_STORE \|\|
N->getOpcode() == ISD::MLOAD \|\|		N->getOpcode() == ISD::MLOAD \|\|
N->getOpcode() == ISD::MSTORE \|\|		N->getOpcode() == ISD::MSTORE \|\|
N->getOpcode() == ISD::MGATHER \|\|		N->getOpcode() == ISD::MGATHER \|\|
N->getOpcode() == ISD::MSCATTER \|\|		N->getOpcode() == ISD::MSCATTER \|\|
		N->getOpcode() == ISD::VP_LOAD \|\|
		N->getOpcode() == ISD::VP_STORE \|\|
		N->getOpcode() == ISD::VP_GATHER \|\|
		N->getOpcode() == ISD::VP_SCATTER \|\|
N->isMemIntrinsic() \|\|		N->isMemIntrinsic() \|\|
N->isTargetMemoryOpcode();		N->isTargetMemoryOpcode();
}		}
};		};

/// This is an SDNode representing atomic operations.		/// This is an SDNode representing atomic operations.
class AtomicSDNode : public MemSDNode {		class AtomicSDNode : public MemSDNode {
public:		public:
▲ Show 20 Lines • Show All 837 Lines • ▼ Show 20 Lines	public:
const SDValue &getOffset() const { return getOperand(3); }		const SDValue &getOffset() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::STORE;		return N->getOpcode() == ISD::STORE;
}		}
};		};

/// This base class is used to represent MLOAD and MSTORE nodes		/// This base class is used to represent MLOAD and MSTORE nodes
		class VPLoadStoreSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		VPLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}

		// VPLoadSDNode (Chain, ptr, mask, VLen)
		// VPStoreSDNode (Chain, data, ptr, mask, VLen)
		// Mask is a vector of i1 elements, Vlen is i32
		const SDValue &getBasePtr() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 1 : 2);
		}
		const SDValue &getMask() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 2 : 3);
		}
		const SDValue &getVectorLength() const {
		return getOperand(getOpcode() == ISD::VP_LOAD ? 3 : 4);
		}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_LOAD \|\|
		N->getOpcode() == ISD::VP_STORE;
		}
		};

		/// This class is used to represent an MLOAD node
		class VPLoadSDNode : public VPLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		VPLoadSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		ISD::LoadExtType ETy, EVT MemVT,
		MachineMemOperand *MMO)
		: VPLoadStoreSDNode(ISD::VP_LOAD, Order, dl, VTs, MemVT, MMO) {
		LoadSDNodeBits.ExtTy = ETy;
		LoadSDNodeBits.IsExpanding = false;
		}

		ISD::LoadExtType getExtensionType() const {
		return static_cast<ISD::LoadExtType>(LoadSDNodeBits.ExtTy);
		}

		const SDValue &getBasePtr() const { return getOperand(1); }
		const SDValue &getMask() const { return getOperand(2); }
		const SDValue &getVectorLength() const { return getOperand(3); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_LOAD;
		}
		bool isExpandingLoad() const { return LoadSDNodeBits.IsExpanding; }
		};

		/// This class is used to represent an MSTORE node
		class VPStoreSDNode : public VPLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		VPStoreSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		bool isTrunc, EVT MemVT,
		MachineMemOperand *MMO)
		: VPLoadStoreSDNode(ISD::VP_STORE, Order, dl, VTs, MemVT, MMO) {
		StoreSDNodeBits.IsTruncating = isTrunc;
		StoreSDNodeBits.IsCompressing = false;
		}

		/// Return true if the op does a truncation before store.
		/// For integers this is the same as doing a TRUNCATE and storing the result.
		/// For floats, it is the same as doing an FP_ROUND and storing the result.
		bool isTruncatingStore() const { return StoreSDNodeBits.IsTruncating; }

		/// Returns true if the op does a compression to the vector before storing.
		/// The node contiguously stores the active elements (integers or floats)
		/// in src (those with their respective bit set in writemask k) to unaligned
		/// memory at base_addr.
		bool isCompressingStore() const { return StoreSDNodeBits.IsCompressing; }

		const SDValue &getValue() const { return getOperand(1); }
		const SDValue &getBasePtr() const { return getOperand(2); }
		const SDValue &getMask() const { return getOperand(3); }
		const SDValue &getVectorLength() const { return getOperand(4); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_STORE;
		}
		};

		/// This base class is used to represent MLOAD and MSTORE nodes
class MaskedLoadStoreSDNode : public MemSDNode {		class MaskedLoadStoreSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
MachineMemOperand *MMO)		MachineMemOperand *MMO)
: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	public:
const SDValue &getMask() const { return getOperand(3); }		const SDValue &getMask() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::MSTORE;		return N->getOpcode() == ISD::MSTORE;
}		}
};		};

/// This is a base class used to represent		/// This is a base class used to represent
		/// VP_GATHER and VP_SCATTER nodes
		///
		class VPGatherScatterSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		VPGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}

		// In the both nodes address is Op1, mask is Op2:
		// VPGatherSDNode (Chain, base, index, scale, mask, vlen)
		// VPScatterSDNode (Chain, value, base, index, sckae, mask, vlen)
		// Mask is a vector of i1 elements
		const SDValue &getBasePtr() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 1 : 2); }
		const SDValue &getIndex() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 2 : 3); }
		const SDValue &getScale() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 3 : 4); }
		const SDValue &getMask() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 4 : 5); }
		const SDValue &getVectorLength() const { return getOperand((getOpcode() == ISD::VP_GATHER) ? 5 : 6); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_GATHER \|\|
		N->getOpcode() == ISD::VP_SCATTER;
		}
		};

		/// This class is used to represent an VP_GATHER node
		///
		class VPGatherSDNode : public VPGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		VPGatherSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO)
		: VPGatherScatterSDNode(ISD::VP_GATHER, Order, dl, VTs, MemVT, MMO) {}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_GATHER;
		}
		};

		/// This class is used to represent an VP_SCATTER node
		///
		class VPScatterSDNode : public VPGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		VPScatterSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO)
		: VPGatherScatterSDNode(ISD::VP_SCATTER, Order, dl, VTs, MemVT, MMO) {}

		const SDValue &getValue() const { return getOperand(1); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::VP_SCATTER;
		}
		};


		/// This is a base class used to represent
/// MGATHER and MSCATTER nodes		/// MGATHER and MSCATTER nodes
///		///
class MaskedGatherScatterSDNode : public MemSDNode {		class MaskedGatherScatterSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

include/llvm/IR/Attributes.td

	Show First 20 Lines • Show All 127 Lines • ▼ Show 20 Lines
	def ReadOnly : EnumAttr<"readonly">;			def ReadOnly : EnumAttr<"readonly">;

	/// Return value is always equal to this argument.			/// Return value is always equal to this argument.
	def Returned : EnumAttr<"returned">;			def Returned : EnumAttr<"returned">;

	/// Parameter is required to be a trivial constant.			/// Parameter is required to be a trivial constant.
	def ImmArg : EnumAttr<"immarg">;			def ImmArg : EnumAttr<"immarg">;

				/// Return value that is equal to this argument on enabled lanes (mask).
				def Passthru : EnumAttr<"passthru">;

				/// Mask argument that applies to this function.
				def Mask : EnumAttr<"mask">;

				/// Dynamic Vector Length argument of this function.
				def VectorLength : EnumAttr<"vlen">;

	/// Function can return twice.			/// Function can return twice.
	def ReturnsTwice : EnumAttr<"returns_twice">;			def ReturnsTwice : EnumAttr<"returns_twice">;

	/// Safe Stack protection.			/// Safe Stack protection.
	def SafeStack : EnumAttr<"safestack">;			def SafeStack : EnumAttr<"safestack">;

	/// Shadow Call Stack protection.			/// Shadow Call Stack protection.
	def ShadowCallStack : EnumAttr<"shadowcallstack">;			def ShadowCallStack : EnumAttr<"shadowcallstack">;
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

include/llvm/IR/InstrTypes.h

Show First 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	#define HANDLE_BINARY_INST(N, OPC, CLASS) \
static BinaryOperator Create##OPC(Value V1, Value *V2, \		static BinaryOperator Create##OPC(Value V1, Value *V2, \
const Twine &Name, Instruction *I) {\		const Twine &Name, Instruction *I) {\
return Create(Instruction::OPC, V1, V2, Name, I);\		return Create(Instruction::OPC, V1, V2, Name, I);\
}		}
#include "llvm/IR/Instruction.def"		#include "llvm/IR/Instruction.def"

static BinaryOperator *CreateWithCopiedFlags(BinaryOps Opc,		static BinaryOperator *CreateWithCopiedFlags(BinaryOps Opc,
Value V1, Value V2,		Value V1, Value V2,
BinaryOperator *CopyBO,		Instruction *CopyBO,
const Twine &Name = "") {		const Twine &Name = "") {
BinaryOperator *BO = Create(Opc, V1, V2, Name);		BinaryOperator *BO = Create(Opc, V1, V2, Name);
BO->copyIRFlags(CopyBO);		BO->copyIRFlags(CopyBO);
return BO;		return BO;
}		}

static BinaryOperator CreateFAddFMF(Value V1, Value *V2,		static BinaryOperator CreateFAddFMF(Value V1, Value *V2,
BinaryOperator *FMFSource,		Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateWithCopiedFlags(Instruction::FAdd, V1, V2, FMFSource, Name);		return CreateWithCopiedFlags(Instruction::FAdd, V1, V2, FMFSource, Name);
}		}
static BinaryOperator CreateFSubFMF(Value V1, Value *V2,		static BinaryOperator CreateFSubFMF(Value V1, Value *V2,
BinaryOperator *FMFSource,		Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateWithCopiedFlags(Instruction::FSub, V1, V2, FMFSource, Name);		return CreateWithCopiedFlags(Instruction::FSub, V1, V2, FMFSource, Name);
}		}
static BinaryOperator CreateFMulFMF(Value V1, Value *V2,		static BinaryOperator CreateFMulFMF(Value V1, Value *V2,
BinaryOperator *FMFSource,		Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateWithCopiedFlags(Instruction::FMul, V1, V2, FMFSource, Name);		return CreateWithCopiedFlags(Instruction::FMul, V1, V2, FMFSource, Name);
}		}
static BinaryOperator CreateFDivFMF(Value V1, Value *V2,		static BinaryOperator CreateFDivFMF(Value V1, Value *V2,
BinaryOperator *FMFSource,		Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateWithCopiedFlags(Instruction::FDiv, V1, V2, FMFSource, Name);		return CreateWithCopiedFlags(Instruction::FDiv, V1, V2, FMFSource, Name);
}		}
static BinaryOperator CreateFRemFMF(Value V1, Value *V2,		static BinaryOperator CreateFRemFMF(Value V1, Value *V2,
BinaryOperator *FMFSource,		Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
return CreateWithCopiedFlags(Instruction::FRem, V1, V2, FMFSource, Name);		return CreateWithCopiedFlags(Instruction::FRem, V1, V2, FMFSource, Name);
}		}
static BinaryOperator CreateFNegFMF(Value Op, BinaryOperator *FMFSource,		static BinaryOperator CreateFNegFMF(Value Op, Instruction *FMFSource,
const Twine &Name = "") {		const Twine &Name = "") {
Value *Zero = ConstantFP::getNegativeZero(Op->getType());		Value *Zero = ConstantFP::getNegativeZero(Op->getType());
return CreateWithCopiedFlags(Instruction::FSub, Zero, Op, FMFSource);		return CreateWithCopiedFlags(Instruction::FSub, Zero, Op, FMFSource);
}		}

static BinaryOperator CreateNSW(BinaryOps Opc, Value V1, Value *V2,		static BinaryOperator CreateNSW(BinaryOps Opc, Value V1, Value *V2,
const Twine &Name = "") {		const Twine &Name = "") {
BinaryOperator *BO = Create(Opc, V1, V2, Name);		BinaryOperator *BO = Create(Opc, V1, V2, Name);
▲ Show 20 Lines • Show All 1,895 Lines • Show Last 20 Lines

include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::dbg_label;		return I->getIntrinsicID() == Intrinsic::dbg_label;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
/// @}		/// @}
};		};

/// This is the common base class for constrained floating point intrinsics.		enum class RoundingMode {
class ConstrainedFPIntrinsic : public IntrinsicInst {
public:
enum RoundingMode {
rmInvalid,		rmInvalid,
rmDynamic,		rmDynamic,
rmToNearest,		rmToNearest,
rmDownward,		rmDownward,
rmUpward,		rmUpward,
rmTowardZero		rmTowardZero
};		};

enum ExceptionBehavior {		enum class ExceptionBehavior {
ebInvalid,		ebInvalid,
ebIgnore,		ebIgnore,
ebMayTrap,		ebMayTrap,
ebStrict		ebStrict
};		};

		class VPIntrinsic : public IntrinsicInst {
		public:
		enum class VPTypeToken : int8_t {
		Scalar = 1, // scalar operand type
		Vector = 2, // vectorized operand type
		Mask = 3 // vector mask type
		};

		using TypeTokenVec = SmallVector<VPTypeToken, 4>;
		using ShortTypeVec = SmallVector<Type*, 4>;

		struct
		VPIntrinsicDesc {
		Intrinsic::ID ID; // LLVM Intrinsic ID.
		TypeTokenVec typeTokens; // Type Parmeters for the LLVM Intrinsic.
		int MaskPos; // Parameter index of the Mask parameter.
		int EVLPos; // Parameter index of the VP parameter.
		};

		// Translate this generic Opcode to a VPIntrinsic
		static VPIntrinsicDesc GetVPIntrinsicDesc(unsigned OC);
		// Translate this non-VP intrinsic to a VPIntrinsic.
		static VPIntrinsicDesc GetVPDescForIntrinsic(unsigned IntrinsicID);

		// Generate the disambiguating type vec for this VP Intrinsic
		static VPIntrinsic::ShortTypeVec
		EncodeTypeTokens(VPIntrinsic::TypeTokenVec TTVec, Type & VectorTy, Type & ScalarTy);

		// available for all VP intrinsics
		Value* getMask() const;
		Value* getVectorLength() const;

		bool isUnaryOp() const;
		bool isBinaryOp() const;
		bool isTernaryOp() const;

		// compare intrinsic
		bool isCompareOp() const { return getIntrinsicID() == Intrinsic::vp_cmp; }
		CmpInst::Predicate getCmpPredicate() const;

		// llvm.vp.constrained.*
		bool isConstrainedOp() const;
		RoundingMode getRoundingMode() const;
		ExceptionBehavior getExceptionBehavior() const;

		// llvm.vp.reduction.*
		bool isReductionOp() const;

		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		switch (I->getIntrinsicID()) {
		default:
		return false;

		case Intrinsic::vp_cmp:

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:

		case Intrinsic::vp_constrained_fadd:
		case Intrinsic::vp_constrained_fsub:
		case Intrinsic::vp_constrained_fmul:
		case Intrinsic::vp_constrained_fdiv:
		case Intrinsic::vp_constrained_frem:
		case Intrinsic::vp_constrained_fma:
		case Intrinsic::vp_constrained_sqrt:
		case Intrinsic::vp_constrained_pow:
		case Intrinsic::vp_constrained_powi:
		case Intrinsic::vp_constrained_sin:
		case Intrinsic::vp_constrained_cos:
		case Intrinsic::vp_constrained_exp:
		case Intrinsic::vp_constrained_exp2:
		case Intrinsic::vp_constrained_log:
		case Intrinsic::vp_constrained_log10:
		case Intrinsic::vp_constrained_log2:
		case Intrinsic::vp_constrained_rint:
		case Intrinsic::vp_constrained_nearbyint:
		case Intrinsic::vp_constrained_maxnum:
		case Intrinsic::vp_constrained_minnum:
		case Intrinsic::vp_constrained_ceil:
		case Intrinsic::vp_constrained_floor:
		case Intrinsic::vp_constrained_round:
		case Intrinsic::vp_constrained_trunc:
		return true;
		}
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}

		// Equivalent non-predicated opcode
		unsigned getFunctionalOpcode() const {
		switch (getIntrinsicID()) {
		default: return Instruction::Call;

		case Intrinsic::vp_cmp:
		if (getArgOperand(0)->getType()->isFloatingPointTy()) {
		return Instruction::FCmp;
		} else {
		return Instruction::ICmp;
		}

		case Intrinsic::vp_and: return Instruction::And;
		case Intrinsic::vp_or: return Instruction::Or;
		case Intrinsic::vp_xor: return Instruction::Xor;
		case Intrinsic::vp_ashr: return Instruction::AShr;
		case Intrinsic::vp_lshr: return Instruction::LShr;
		case Intrinsic::vp_shl: return Instruction::Shl;

		case Intrinsic::vp_select: return Instruction::Select;

		case Intrinsic::vp_load: return Instruction::Load;
		case Intrinsic::vp_store: return Instruction::Store;

		case Intrinsic::vp_fneg: return Instruction::FNeg;

		case Intrinsic::vp_fadd: return Instruction::FAdd;
		case Intrinsic::vp_fsub: return Instruction::FSub;
		case Intrinsic::vp_fmul: return Instruction::FMul;
		case Intrinsic::vp_fdiv: return Instruction::FDiv;
		case Intrinsic::vp_frem: return Instruction::FRem;

		case Intrinsic::vp_add: return Instruction::Add;
		case Intrinsic::vp_sub: return Instruction::Sub;
		case Intrinsic::vp_mul: return Instruction::Mul;
		case Intrinsic::vp_udiv: return Instruction::UDiv;
		case Intrinsic::vp_sdiv: return Instruction::SDiv;
		case Intrinsic::vp_urem: return Instruction::URem;
		case Intrinsic::vp_srem: return Instruction::SRem;
		}
		}
		};

		/// This is the common base class for constrained floating point intrinsics.
		class ConstrainedFPIntrinsic : public IntrinsicInst {
		public:
bool isUnaryOp() const;		bool isUnaryOp() const;
bool isTernaryOp() const;		bool isTernaryOp() const;
RoundingMode getRoundingMode() const;		RoundingMode getRoundingMode() const;
ExceptionBehavior getExceptionBehavior() const;		ExceptionBehavior getExceptionBehavior() const;

// Methods for support type inquiry through isa, cast, and dyn_cast:		// Methods for support type inquiry through isa, cast, and dyn_cast:
static bool classof(const IntrinsicInst *I) {		static bool classof(const IntrinsicInst *I) {
switch (I->getIntrinsicID()) {		switch (I->getIntrinsicID()) {
▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
}		}

// ReadNone - The specified argument pointer is not dereferenced by the		// ReadNone - The specified argument pointer is not dereferenced by the
// intrinsic.		// intrinsic.
class ReadNone<int argNo> : IntrinsicProperty {		class ReadNone<int argNo> : IntrinsicProperty {
int ArgNo = argNo;		int ArgNo = argNo;
}		}

		// VectorLength - The specified argument is the Dynamic Vector Length of the
		// operation.
		class VectorLength<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}

		// Mask - The specified argument contains the per-lane mask of this
		// intrinsic. Inputs on masked-out lanes must not affect the result of this
		// intrinsic (except for the Passthru argument).
		class Mask<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}
		// Passthru - The specified argument contains the per-lane return value
		// for this vector intrinsic where the mask is false.
		// (requires the Mask attribute in the same function)
		class Passthru<int argNo> : IntrinsicProperty {
		int ArgNo = argNo;
		}

def IntrNoReturn : IntrinsicProperty;		def IntrNoReturn : IntrinsicProperty;

// IntrCold - Calls to this intrinsic are cold.		// IntrCold - Calls to this intrinsic are cold.
// Parallels the cold attribute on LLVM IR functions.		// Parallels the cold attribute on LLVM IR functions.
def IntrCold : IntrinsicProperty;		def IntrCold : IntrinsicProperty;

// IntrNoduplicate - Calls to this intrinsic cannot be duplicated.		// IntrNoduplicate - Calls to this intrinsic cannot be duplicated.
// Parallels the noduplicate attribute on LLVM IR functions.		// Parallels the noduplicate attribute on LLVM IR functions.
▲ Show 20 Lines • Show All 913 Lines • ▼ Show 20 Lines
// Clear cache intrinsic, default to ignore (ie. emit nothing)		// Clear cache intrinsic, default to ignore (ie. emit nothing)
// maps to void __clear_cache() on supporting platforms		// maps to void __clear_cache() on supporting platforms
def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],		def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
[], "llvm.clear_cache">;		[], "llvm.clear_cache">;

// Intrinsic to detect whether its argument is a constant.		// Intrinsic to detect whether its argument is a constant.
def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;		def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;

		//===---------------- Vector Predication Intrinsics --------------===//

		// Memory Intrinsics
		def int_vp_store : Intrinsic<[],
		[ llvm_anyvector_ty,
		LLVMAnyPointerType<LLVMMatchType<0>>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrArgMemOnly, Mask<2>, VectorLength<3> ]>;

		def int_vp_load : Intrinsic<[ llvm_anyvector_ty],
		[ LLVMAnyPointerType<LLVMMatchType<0>>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrReadMem, IntrArgMemOnly, Mask<1>, VectorLength<2> ]>;

		def int_vp_gather: Intrinsic<[ llvm_anyvector_ty],
		[ LLVMVectorOfAnyPointersToElt<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrReadMem, IntrReadMem, Mask<1>, VectorLength<2> ]>;

		def int_vp_scatter: Intrinsic<[],
		[ llvm_anyvector_ty,
		LLVMVectorOfAnyPointersToElt<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrArgMemOnly, Mask<2>, VectorLength<3> ]>;

		// Reductions
		let IntrProperties = [IntrNoMem, Mask<2>, VectorLength<3>] in {
		def int_vp_reduce_add : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_mul : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_and : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_or : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_xor : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smax : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_smin : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umax : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_umin : Intrinsic<[llvm_anyint_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;

		def int_vp_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmax : Intrinsic<[llvm_anyfloat_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],
		[LLVMMatchType<0>,
		llvm_anyvector_ty,
		LLVMScalarOrSameVectorWidth<1, llvm_i1_ty>,
		llvm_i32_ty]>;
		}

		// Binary operators
		let IntrProperties = [IntrNoMem, Mask<2>, VectorLength<3>] in {
		def int_vp_add : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		programmerjakeUnsubmitted Not Done Reply Inline Actions We will need to change the mask parameter length to allow for mask lengths that are a divisor of the main vector length. See http://lists.llvm.org/pipermail/llvm-dev/2019-February/129845.html programmerjake: We will need to change the mask parameter length to allow for mask lengths that are a divisor…
		simollAuthorUnsubmitted Done Reply Inline Actions Can we make the vector length operate at the granularity of the mask? In your case [1] that would mean that the AVL refers to multiples of the short element vector (eg `<3 x float>`). [1] http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-January/000433.html simoll: Can we make the vector length operate at the granularity of the mask? In your case [1] that…
		programmerjakeUnsubmitted Not Done Reply Inline Actions I was initially assuming that the vector length would be in the granularity of the mask. That would work for my ISA extension. I think that would work for the RISC-V V extension, would have to double check, or get someone who's working on it to check. I don't think that would work without needing to multiply the vector length on AVX512, assuming a shift is used to generate the mask. I have no clue for ARM SVE or other architectures. programmerjake: I was initially assuming that the vector length would be in the granularity of the mask. That…
		simollAuthorUnsubmitted Done Reply Inline Actions So we are on the same page here. What i actually had in mind is how this would interact with scalable vectors,e.g.: <scalable 2 x float> evl.fsub(<scalable 2 x float> %x, <scalable 2 x float> %y, <scalable 2 x i1> %M, i32 %L) In that case, the vector length should refer to packets of two elements. That would be a perfect match for NEC SX-Aurora, where AVL always refers to 64 bit elements (eg there is a packed float mode). simoll: So we are on the same page here. What i actually had in mind is how this would interact with…
		rkruppeUnsubmitted Not Done Reply Inline Actions That definitely wouldn't work for RISC-V V, as its vector length register counts in elements, not bigger packets. For example, (in the latest public version of the spec at the moment, v0.6-draft), `<scalable 4 x i8>` is a natural type for a vector of 8-bit integers. You might use it in a loop that doesn't need 16- or 32-bit elements, and operations on it have to interpret the active vector length as being in terms of 8 bit elements to match the hardware, not in terms of 32 bit elements. Moreover, it seems incongruent with the scalable vector type proposal to treat vlen as being in terms of `vscale` rather than in terms of vector elements. `<scalable n x T>` is simply an `(n * vscale)`-element vector and that the `vscale` factor is not known at compile time is inconsequential for numbering or interpreting the lanes (e.g., lane indices for shuffles or element inserts/extracts go from 0 to `(n * vscale) - 1`). In fact, I believe it is currently the case that scalable vectors can be legalized by picking some constant for vscale (e.g., 1) and simply replacing every `<scalable n x T>` with `<(CONST_VSCALE * n) x T>` and every call to `llvm.vscale()` with that constant. I don't think it would be a good match for SVE or other "predication only" architectures either: as Jacob pointed out for the case of AVX-512, it seems to require an extra multiplication/shift to generate the mask corresponding to the vector length. This is probably secondary, but it feels like another hint that this line of thought is not exactly a smooth, natural extension. rkruppe: That definitely wouldn't work for RISC-V V, as its vector length register counts in elements…
		simollAuthorUnsubmitted Done Reply Inline Actions That definitely wouldn't work for RISC-V V, as its vector length register counts in elements, not bigger packets. For example, (in the latest public version of the spec at the moment, v0.6-draft), <scalable 4 x i8> is a natural type for a vector of 8-bit integers. You might use it in a loop that doesn't need 16- or 32-bit elements, and operations on it have to interpret the active vector length as being in terms of 8 bit elements to match the hardware, not in terms of 32 bit elements. Why couldn't you use <scalable 1 x i8> then? Moreover, it seems incongruent with the scalable vector type proposal to treat vlen as being in terms of vscale rather than in terms of vector elements. <scalable n x T> is simply an (n * vscale)-element vector and that the vscale factor is not known at compile time is inconsequential for numbering or interpreting the lanes (e.g., lane indices for shuffles or element inserts/extracts go from 0 to (n * vscale) - 1). In fact, I believe it is currently the case that scalable vectors can be legalized by picking some constant for vscale (e.g., 1) and simply replacing every <scalable n x T> with <(CONST_VSCALE * n) x T> and every call to llvm.vscale() with that constant. Instead llvm.scale() would be replaced by a constant CONST_VSCALE times another constant: vscale. This does not seem a substantial difference to me. I don't think it would be a good match for SVE or other "predication only" architectures either: as Jacob pointed out for the case of AVX-512, it seems to require an extra multiplication/shift to generate the mask corresponding to the vector length. This is probably secondary, but it feels like another hint that this line of thought is not exactly a smooth, natural extension. You would only ever use the full vector length as vlen parameter when you generate EVL for architectures like AVX512, SVE in the first place. Yes, lowering it otherwise may involve a shift (or adding a constant vector) in the worst case. However, all of this will happen on the legalization code path that is not expected to yield fast code but something that is correct and somehow reasonable.. we already do legalize things like llvm.masked.gather on SSE (and it ain't pretty). simoll: > That definitely wouldn't work for RISC-V V, as its vector length register counts in elements…
		rkruppeUnsubmitted Not Done Reply Inline Actions Why couldn't you use <scalable 1 x i8> then? Each vector register holds a multiple of 32 bit (on that particular target), so `<scalable 4 x i8>` is just the truth :) It's also important to be able to express the difference between "stuffing the vector register with as many elements as will fit" (here, `<scalable 4 x i8>`) versus having only half (`<scalable 2 x i8>`) or a quarter (`<scalable 1 x i8>`) as many elements because your vectorization factor is limited by larger elements types elsewhere in the code -- in mixed precision code you'll want to do either depending on how you vectorize. The distinction is also important for vector function ABIs, e.g. you might have both `vsin16s(<scalable 1 x f16>)` and `vsin16d(<scalable 2 x f16>)`. Additionally, I want to actually be able to actually use the full vector register without implementing a dynamically changing vscale. Not just because I'm lazy, but also because the architecture has changed enough that the motivation for it has become lessened, so maybe that will not be upstreamed (or only later). Instead llvm.scale() would be replaced by a constant CONST_VSCALE times another constant: vscale. This does not seem a substantial difference to me. My point isn't that legalization becomes difficult, it's that scalable vectors are not intended as "a sequence of fixed-size vectors" but rather ordinary vectors whose length happens to be a bit more complex than a compile time constant. A vlen that is in units of `vscale` is thus unnatural and clashes with every other operation on scalable vectors. If we were talking about a family of intrinsics specifically targeted at the "vector of `float4`s" use case, that would be inherent and good, but we're not. It's unfortunate that this clashes with how SX-Aurora's packed operations work, I did not know that. You would only ever use the full vector length as vlen parameter when you generate EVL for architectures like AVX512, SVE in the first place. Certainly, that's why I say it's secondary and mostly a hint that something is amiss with "the current thinking". In fact, I am by now inclined to propose that Jacob and collaborators start out by expressing their architecture's operations with target-specific intrinsics that also use the attributes introduced here (especially since none of the typical vectorizers are equipped to generate the sort of code they want from scalar code using e.g. `float4` types). Alternatively, use a dynamic vector length of `<the length their architecture wants> * <how many elements each of the short vectors has>` and fix it up in the backend. rkruppe: > Why couldn't you use <scalable 1 x i8> then? Each vector register holds a multiple of 32 bit…
		simollAuthorUnsubmitted Done Reply Inline Actions Each vector register holds a multiple of 32 bit (on that particular target), so <scalable 4 x i8> is just the truth :) It's also important to be able to express the difference between "stuffing the vector register with as many elements as will fit" (here, <scalable 4 x i8>) versus having only half (<scalable 2 x i8>) or a quarter (<scalable 1 x i8>) as many elements because your vectorization factor is limited by larger elements types elsewhere in the code -- in mixed precision code you'll want to do either depending on how you vectorize. The distinction is also important for vector function ABIs, e.g. you might have both vsin16s(<scalable 1 x f16>) and vsin16d(<scalable 2 x f16>). Makes sense. However, if VL is element-grained on IR-level then there need be functions in TTI to query the native VL grain size for the target (and per scalable type). Eg for SX-Aurora in packed float mode the grain size is <2 x float> and so you might want to generate a remainder loop in that case (unpredicated vector body + predicated vector body for the last iteration). My point isn't that legalization becomes difficult, it's that scalable vectors are not intended as "a sequence of fixed-size vectors" but rather ordinary vectors whose length happens to be a bit more complex than a compile time constant Quote from "[llvm-dev] [RFC] Supporting ARM's SVE in LLVM", Graham Hunter: To represent a vector of unknown length a scaling property is added to the `VectorType` class whose element count becomes an unknown multiple of a known minimum element count <snip> A similar rule applies to vector floating point MVTs but those types whose static component is less that 128bits (MVT::nx2f32) are also mapped directly to SVE data registers but in a form whereby elements are effectively interleaved with enough undefined elements to fulfil the 128bit requirement. I think the sub-vector interpretation is actually the more natural reading of SVE, considering that sub-vectors are padded/interleaved to fit a native sub-register size (128bit on SVE, 64bit on SX-Aurora and 32bit on RVV (in general, or just the RVV implementation you are working on?)). Each sub-vector in the full scalable type is offset by a multiple of that size, so a scalable type is an array of padded sub-vectors. simoll: > Each vector register holds a multiple of 32 bit (on that particular target), so <scalable 4 x…
		rkruppeUnsubmitted Not Done Reply Inline Actions Makes sense. However, if VL is element-grained on IR-level then there need be functions in TTI to query the native VL grain size for the target (and per scalable type). Eg for SX-Aurora in packed float mode the grain size is <2 x float> and so you might want to generate a remainder loop in that case (unpredicated vector body + predicated vector body for the last iteration). Yes, this difference affects how you should vectorize, so it needs to be in TTI in some form. I think the sub-vector interpretation is actually the more natural reading of SVE, considering that sub-vectors are padded/interleaved to fit a native sub-register size (128bit on SVE, 64bit on SX-Aurora and 32bit on RVV (in general, or just the RVV implementation you are working on?)). Each sub-vector in the full scalable type is offset by a multiple of that size, so a scalable type is an array of padded sub-vectors. This behavior does not privilege or require a concept of "subvectors" (just the total number of elements/bits having a certain factor). It can be framed that way if you insist, but can just as naturally be framed as interleaving the whole vector or by "padding each element". The latter is an especially useful perspective for code handling mixed element types because there it's important that each 32 bit float lines up with the corresponding elements of vectors with larger and smaller elements. PS: for integers the "pad each element" perspective is particularly strong, because e.g. `<scalable 1 x i8>` is literally just promoted to `<scalable 1 x i32>` (on RVV, different numbers on other architectures). No undef "interleaving" with anything, just straight up sign extension. 32bit on RVV (in general, or just the RVV implementation you are working on?) It depends on the machine and target triple. At minimum it's the largest support element type, and some environments ("platform specs" in RISC-V jargon) could demand more from supported hardware/guarantee more to software. The relevant parameters are called ELEN and VLEN in the spec. rkruppe: > Makes sense. However, if VL is element-grained on IR-level then there need be functions in…
		llvm_i32_ty]>;
		def int_vp_sub : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_mul : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_sdiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_udiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_srem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_urem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		def int_vp_fadd : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_fsub : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_fmul : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_fdiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_frem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		// Logical operators
		def int_vp_ashr : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_lshr : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_shl : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_or : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_and : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;
		def int_vp_xor : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty]>;

		// Comparison
		// The last argument is the comparison predicate
		def int_vp_cmp : Intrinsic<[ llvm_anyvector_ty ],
		[ llvm_anyvector_ty,
		LLVMMatchType<1>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty,
		llvm_i8_ty]>;
		}



		def int_vp_fneg : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, Mask<1>, VectorLength<2> ]>;

		def int_vp_fma : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, Mask<3>, VectorLength<4> ]>;

		// Shuffle
		def int_vp_vshift: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_i32_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

		def int_vp_expand: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

		def int_vp_compress: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty],
		[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

		// Select
		def int_vp_select : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_i32_ty],
		[ IntrNoMem, Passthru<2>, Mask<0>, VectorLength<3> ]>;

		// Compose
		def int_vp_compose : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_i32_ty,
		llvm_i32_ty],
		[ IntrNoMem, VectorLength<3> ]>;



		// These intrinsics are sensitive to the rounding mode so we need constrained
		// versions of each of them. When strict rounding and exception control are
		// not required the non-constrained versions of these intrinsics should be
		// used.

		let IntrProperties = [IntrInaccessibleMemOnly, Mask<4>, VectorLength<5> ] in {
		def int_vp_constrained_fadd : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_fsub : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_fmul : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_fdiv : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_frem : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;

		def int_vp_constrained_pow : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_maxnum : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_minnum : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		}

		def int_vp_constrained_fma : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		LLVMMatchType<0>,
		LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ],
		[ IntrInaccessibleMemOnly, Mask<5>, VectorLength<6> ]>;

		let IntrProperties = [IntrInaccessibleMemOnly, Mask<3>, VectorLength<4> ] in {
		def int_vp_constrained_sqrt : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_powi : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_i32_ty,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_sin : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_cos : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_log : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_log10: Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_log2 : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_exp : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_exp2 : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_rint : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_nearbyint : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_ceil : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_floor : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_round : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		def int_vp_constrained_trunc : Intrinsic<[ llvm_anyvector_ty ],
		[ LLVMMatchType<0>,
		llvm_metadata_ty,
		llvm_metadata_ty,
		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		llvm_i32_ty ]>;
		}



//===-------------------------- Masked Intrinsics -------------------------===//		//===-------------------------- Masked Intrinsics -------------------------===//
//		// TODO poised for deprecation (superseded by llvm.vp*. intrinsics)
def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,		def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,
LLVMAnyPointerType<LLVMMatchType<0>>,		LLVMAnyPointerType<LLVMMatchType<0>>,
llvm_i32_ty,		llvm_i32_ty,
LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],		LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
[IntrArgMemOnly, ImmArg<2>]>;		[IntrArgMemOnly, ImmArg<2>]>;

def int_masked_load : Intrinsic<[llvm_anyvector_ty],		def int_masked_load : Intrinsic<[llvm_anyvector_ty],
[LLVMAnyPointerType<LLVMMatchType<0>>, llvm_i32_ty,		[LLVMAnyPointerType<LLVMMatchType<0>>, llvm_i32_ty,
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	: Intrinsic<[],
]>;		]>;

// @llvm.memset.element.unordered.atomic.*(dest, value, length, elementsize)		// @llvm.memset.element.unordered.atomic.*(dest, value, length, elementsize)
def int_memset_element_unordered_atomic		def int_memset_element_unordered_atomic
: Intrinsic<[], [ llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty ],		: Intrinsic<[], [ llvm_anyptr_ty, llvm_i8_ty, llvm_anyint_ty, llvm_i32_ty ],
[ IntrArgMemOnly, NoCapture<0>, WriteOnly<0>, ImmArg<3> ]>;		[ IntrArgMemOnly, NoCapture<0>, WriteOnly<0>, ImmArg<3> ]>;

//===------------------------ Reduction Intrinsics ------------------------===//		//===------------------------ Reduction Intrinsics ------------------------===//
		// TODO poised for deprecation (superseded by llvm.vp*. intrinsics)
//		//
def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
[llvm_anyfloat_ty,		[llvm_anyfloat_ty,
llvm_anyvector_ty],		llvm_anyvector_ty],
[IntrNoMem]>;		[IntrNoMem]>;
def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],		def int_experimental_vector_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
[llvm_anyfloat_ty,		[llvm_anyfloat_ty,
llvm_anyvector_ty],		llvm_anyvector_ty],
▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

include/llvm/IR/MatcherCast.h

This file was added.

				#ifndef LLVM_IR_MATCHERCAST_H
				#define LLVM_IR_MATCHERCAST_H

				//===- MatcherCast.h - Match on the LLVM IR --------------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Parameterized class hierachy for templatized pattern matching.
				//
				//===----------------------------------------------------------------------===//


				namespace llvm {
				namespace PatternMatch {


				// type modification
				template<typename Matcher, typename DestClass>
				struct MatcherCast { };

				// whether the Value \p Obj behaves like a \p Class.
				template<typename MatcherClass, typename Class>
				bool match_isa(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return isa<const DestClass>(Obj);
				}

				template<typename MatcherClass, typename Class>
				auto match_cast(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return cast<const DestClass>(Obj);
				}
				template<typename MatcherClass, typename Class>
				auto match_dyn_cast(const Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return dyn_cast<const DestClass>(Obj);
				}

				template<typename MatcherClass, typename Class>
				auto match_cast(Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return cast<DestClass>(Obj);
				}
				template<typename MatcherClass, typename Class>
				auto match_dyn_cast(Value* Obj) {
				using UnconstClass = typename std::remove_cv<Class>::type;
				using DestClass = typename MatcherCast<MatcherClass, UnconstClass>::ActualCastType;
				return dyn_cast<DestClass>(Obj);
				}


				} // namespace PatternMatch

				} // namespace llvm

				#endif // LLVM_IR_MATCHERCAST_H

include/llvm/IR/PatternMatch.h

Show All 33 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
		#include "llvm/IR/MatcherCast.h"

#include <cstdint>		#include <cstdint>


namespace llvm {		namespace llvm {
namespace PatternMatch {		namespace PatternMatch {

		// Use verbatim types in default (empty) context.
		struct EmptyContext {
		EmptyContext() {}

		EmptyContext(const Value *) {}

		EmptyContext(const EmptyContext & E) {}

		// reset this match context to be rooted at \p V
		void reset(Value * V) {}

		// accept a match where \p Val is in a non-leaf position in a match pattern
		bool acceptInnerNode(const Value * Val) const { return true; }

		// accept a match where \p Val is bound to a free variable.
		bool acceptBoundNode(const Value * Val) const { return true; }

		// whether this context is compatiable with \p E.
		bool acceptContext(EmptyContext E) const { return true; }

		// merge the context \p E into this context and return whether the resulting context is valid.
		bool mergeContext(EmptyContext E) { return true; }

		// reset this context to \p Val.
		template <typename Val, typename Pattern> bool reset_match(Val *V, const Pattern &P) {
		reset(V);
		return const_cast<Pattern &>(P).match_context(V, *this);
		}

		// match in the current context
		template <typename Val, typename Pattern> bool try_match(Val *V, const Pattern &P) {
		return const_cast<Pattern &>(P).match_context(V, *this);
		}
		};

		template<typename DestClass>
		struct MatcherCast<EmptyContext, DestClass> { using ActualCastType = DestClass; };






		// match without (== empty) context
template <typename Val, typename Pattern> bool match(Val *V, const Pattern &P) {		template <typename Val, typename Pattern> bool match(Val *V, const Pattern &P) {
return const_cast<Pattern &>(P).match(V);		EmptyContext ECtx;
		return const_cast<Pattern &>(P).match_context(V, ECtx);
}		}

		// match pattern in a given context
		template <typename Val, typename Pattern, typename MatchContext> bool match(Val *V, const Pattern &P, MatchContext & MContext) {
		return const_cast<Pattern &>(P).match_context(V, MContext);
		}



template <typename SubPattern_t> struct OneUse_match {		template <typename SubPattern_t> struct OneUse_match {
SubPattern_t SubPattern;		SubPattern_t SubPattern;

OneUse_match(const SubPattern_t &SP) : SubPattern(SP) {}		OneUse_match(const SubPattern_t &SP) : SubPattern(SP) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) {
return V->hasOneUse() && SubPattern.match(V);		EmptyContext EContext; return match_context(V, EContext);
		}

		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		return V->hasOneUse() && SubPattern.match_context(V, MContext);
}		}
};		};

template <typename T> inline OneUse_match<T> m_OneUse(const T &SubPattern) {		template <typename T> inline OneUse_match<T> m_OneUse(const T &SubPattern) {
return SubPattern;		return SubPattern;
}		}

template <typename Class> struct class_match {		template <typename Class> struct class_match {
template <typename ITy> bool match(ITy *V) { return isa<Class>(V); }		template <typename ITy> bool match(ITy *V) {
		EmptyContext EContext; return match_context<ITy, EmptyContext>(V, EContext);
		}
		template <typename ITy, typename MatchContext>
		bool match_context(ITy *V, MatchContext & MContext) { return match_isa<MatchContext, Class>(V); }
};		};

/// Match an arbitrary value and ignore it.		/// Match an arbitrary value and ignore it.
inline class_match<Value> m_Value() { return class_match<Value>(); }		inline class_match<Value> m_Value() { return class_match<Value>(); }

/// Match an arbitrary binary operation and ignore it.		/// Match an arbitrary binary operation and ignore it.
inline class_match<BinaryOperator> m_BinOp() {		inline class_match<BinaryOperator> m_BinOp() {
return class_match<BinaryOperator>();		return class_match<BinaryOperator>();
Show All 15 Lines

/// Matching combinators		/// Matching combinators
template <typename LTy, typename RTy> struct match_combine_or {		template <typename LTy, typename RTy> struct match_combine_or {
LTy L;		LTy L;
RTy R;		RTy R;

match_combine_or(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}		match_combine_or(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (L.match(V))		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
		MatchContext SubContext;

		if (L.match_context(V, SubContext) && MContext.acceptContext(SubContext)) {
		MContext.mergeContext(SubContext);
return true;		return true;
if (R.match(V))		}
		if (R.match_context(V, MContext)) {
return true;		return true;
		}
return false;		return false;
}		}
};		};

template <typename LTy, typename RTy> struct match_combine_and {		template <typename LTy, typename RTy> struct match_combine_and {
LTy L;		LTy L;
RTy R;		RTy R;

match_combine_and(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}		match_combine_and(const LTy &Left, const RTy &Right) : L(Left), R(Right) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (L.match(V))		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (R.match(V))		if (L.match_context(V, MContext))
		if (R.match_context(V, MContext))
return true;		return true;
return false;		return false;
}		}
};		};

/// Combine two pattern matchers matching L \|\| R		/// Combine two pattern matchers matching L \|\| R
template <typename LTy, typename RTy>		template <typename LTy, typename RTy>
inline match_combine_or<LTy, RTy> m_CombineOr(const LTy &L, const RTy &R) {		inline match_combine_or<LTy, RTy> m_CombineOr(const LTy &L, const RTy &R) {
return match_combine_or<LTy, RTy>(L, R);		return match_combine_or<LTy, RTy>(L, R);
}		}

/// Combine two pattern matchers matching L && R		/// Combine two pattern matchers matching L && R
template <typename LTy, typename RTy>		template <typename LTy, typename RTy>
inline match_combine_and<LTy, RTy> m_CombineAnd(const LTy &L, const RTy &R) {		inline match_combine_and<LTy, RTy> m_CombineAnd(const LTy &L, const RTy &R) {
return match_combine_and<LTy, RTy>(L, R);		return match_combine_and<LTy, RTy>(L, R);
}		}

struct apint_match {		struct apint_match {
const APInt *&Res;		const APInt *&Res;

apint_match(const APInt *&R) : Res(R) {}		apint_match(const APInt *&R) : Res(R) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CI = dyn_cast<ConstantInt>(V)) {		if (auto *CI = dyn_cast<ConstantInt>(V)) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue())) {		if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
return false;		return false;
}		}
};		};
// Either constexpr if or renaming ConstantFP::getValueAPF to		// Either constexpr if or renaming ConstantFP::getValueAPF to
// ConstantFP::getValue is needed to do it via single template		// ConstantFP::getValue is needed to do it via single template
// function for both apint/apfloat.		// function for both apint/apfloat.
struct apfloat_match {		struct apfloat_match {
const APFloat *&Res;		const APFloat *&Res;
apfloat_match(const APFloat *&R) : Res(R) {}		apfloat_match(const APFloat *&R) : Res(R) {}
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CI = dyn_cast<ConstantFP>(V)) {		if (auto *CI = dyn_cast<ConstantFP>(V)) {
Res = &CI->getValueAPF();		Res = &CI->getValueAPF();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantFP>(C->getSplatValue())) {		if (auto *CI = dyn_cast_or_null<ConstantFP>(C->getSplatValue())) {
Res = &CI->getValueAPF();		Res = &CI->getValueAPF();
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a ConstantInt or splatted ConstantVector, binding the		/// Match a ConstantInt or splatted ConstantVector, binding the
/// specified pointer to the contained APInt.		/// specified pointer to the contained APInt.
inline apint_match m_APInt(const APInt *&Res) { return Res; }		inline apint_match m_APInt(const APInt *&Res) { return Res; }

/// Match a ConstantFP or splatted ConstantVector, binding the		/// Match a ConstantFP or splatted ConstantVector, binding the
/// specified pointer to the contained APFloat.		/// specified pointer to the contained APFloat.
inline apfloat_match m_APFloat(const APFloat *&Res) { return Res; }		inline apfloat_match m_APFloat(const APFloat *&Res) { return Res; }

template <int64_t Val> struct constantint_match {		template <int64_t Val> struct constantint_match {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V)) {		if (const auto *CI = dyn_cast<ConstantInt>(V)) {
const APInt &CIV = CI->getValue();		const APInt &CIV = CI->getValue();
if (Val >= 0)		if (Val >= 0)
return CIV == static_cast<uint64_t>(Val);		return CIV == static_cast<uint64_t>(Val);
// If Val is negative, and CI is shorter than it, truncate to the right		// If Val is negative, and CI is shorter than it, truncate to the right
// number of bits. If it is larger, then we have to sign extend. Just		// number of bits. If it is larger, then we have to sign extend. Just
// compare their negated values.		// compare their negated values.
return -CIV == -Val;		return -CIV == -Val;
}		}
return false;		return false;
}		}
};		};

/// Match a ConstantInt with a specific value.		/// Match a ConstantInt with a specific value.
template <int64_t Val> inline constantint_match<Val> m_ConstantInt() {		template <int64_t Val> inline constantint_match<Val> m_ConstantInt() {
return constantint_match<Val>();		return constantint_match<Val>();
}		}

/// This helper class is used to match scalar and vector integer constants that		/// This helper class is used to match scalar and vector integer constants that
/// satisfy a specified predicate.		/// satisfy a specified predicate.
/// For vector constants, undefined elements are ignored.		/// For vector constants, undefined elements are ignored.
template <typename Predicate> struct cst_pred_ty : public Predicate {		template <typename Predicate> struct cst_pred_ty : public Predicate {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V))		if (const auto *CI = dyn_cast<ConstantInt>(V))
return this->isValue(CI->getValue());		return this->isValue(CI->getValue());
if (V->getType()->isVectorTy()) {		if (V->getType()->isVectorTy()) {
if (const auto *C = dyn_cast<Constant>(V)) {		if (const auto *C = dyn_cast<Constant>(V)) {
if (const auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))		if (const auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))
return this->isValue(CI->getValue());		return this->isValue(CI->getValue());

// Non-splat vector constant: check each element for a match.		// Non-splat vector constant: check each element for a match.
Show All 20 Lines

/// This helper class is used to match scalar and vector constants that		/// This helper class is used to match scalar and vector constants that
/// satisfy a specified predicate, and bind them to an APInt.		/// satisfy a specified predicate, and bind them to an APInt.
template <typename Predicate> struct api_pred_ty : public Predicate {		template <typename Predicate> struct api_pred_ty : public Predicate {
const APInt *&Res;		const APInt *&Res;

api_pred_ty(const APInt *&R) : Res(R) {}		api_pred_ty(const APInt *&R) : Res(R) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CI = dyn_cast<ConstantInt>(V))		if (const auto *CI = dyn_cast<ConstantInt>(V))
if (this->isValue(CI->getValue())) {		if (this->isValue(CI->getValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))		if (auto *CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue()))
if (this->isValue(CI->getValue())) {		if (this->isValue(CI->getValue())) {
Res = &CI->getValue();		Res = &CI->getValue();
return true;		return true;
}		}

return false;		return false;
}		}
};		};

/// This helper class is used to match scalar and vector floating-point		/// This helper class is used to match scalar and vector floating-point
/// constants that satisfy a specified predicate.		/// constants that satisfy a specified predicate.
/// For vector constants, undefined elements are ignored.		/// For vector constants, undefined elements are ignored.
template <typename Predicate> struct cstfp_pred_ty : public Predicate {		template <typename Predicate> struct cstfp_pred_ty : public Predicate {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CF = dyn_cast<ConstantFP>(V))		if (const auto *CF = dyn_cast<ConstantFP>(V))
return this->isValue(CF->getValueAPF());		return this->isValue(CF->getValueAPF());
if (V->getType()->isVectorTy()) {		if (V->getType()->isVectorTy()) {
if (const auto *C = dyn_cast<Constant>(V)) {		if (const auto *C = dyn_cast<Constant>(V)) {
if (const auto *CF = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))		if (const auto *CF = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))
return this->isValue(CF->getValueAPF());		return this->isValue(CF->getValueAPF());

// Non-splat vector constant: check each element for a match.		// Non-splat vector constant: check each element for a match.
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
};		};
/// Match an integer 0 or a vector with all elements equal to 0.		/// Match an integer 0 or a vector with all elements equal to 0.
/// For vectors, this includes constants with undefined elements.		/// For vectors, this includes constants with undefined elements.
inline cst_pred_ty<is_zero_int> m_ZeroInt() {		inline cst_pred_ty<is_zero_int> m_ZeroInt() {
return cst_pred_ty<is_zero_int>();		return cst_pred_ty<is_zero_int>();
}		}

struct is_zero {		struct is_zero {
template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
auto *C = dyn_cast<Constant>(V);		auto *C = dyn_cast<Constant>(V);
return C && (C->isNullValue() \|\| cst_pred_ty<is_zero_int>().match(C));		return C && (C->isNullValue() \|\| cst_pred_ty<is_zero_int>().match(C));
}		}
};		};
/// Match any null constant or a vector with all elements equal to 0.		/// Match any null constant or a vector with all elements equal to 0.
/// For vectors, this includes constants with undefined elements.		/// For vectors, this includes constants with undefined elements.
inline is_zero m_Zero() {		inline is_zero m_Zero() {
return is_zero();		return is_zero();
▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines

///////////////////////////////////////////////////////////////////////////////		///////////////////////////////////////////////////////////////////////////////

template <typename Class> struct bind_ty {		template <typename Class> struct bind_ty {
Class *&VR;		Class *&VR;

bind_ty(Class *&V) : VR(V) {}		bind_ty(Class *&V) : VR(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (auto *CV = dyn_cast<Class>(V)) {		if (auto *CV = dyn_cast<Class>(V)) {
		if (!MContext.acceptBoundNode(V)) return false;

VR = CV;		VR = CV;
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a value, capturing it if we match.		/// Match a value, capturing it if we match.
Show All 15 Lines
inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }		inline bind_ty<ConstantFP> m_ConstantFP(ConstantFP *&C) { return C; }

/// Match a specified Value*.		/// Match a specified Value*.
struct specificval_ty {		struct specificval_ty {
const Value *Val;		const Value *Val;

specificval_ty(const Value *V) : Val(V) {}		specificval_ty(const Value *V) : Val(V) {}

template <typename ITy> bool match(ITy *V) { return V == Val; }		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) { return V == Val; }
};		};

/// Match if we have a specific specified value.		/// Match if we have a specific specified value.
inline specificval_ty m_Specific(const Value *V) { return V; }		inline specificval_ty m_Specific(const Value *V) { return V; }

/// Stores a reference to the Value , not the Value itself,		/// Stores a reference to the Value , not the Value itself,
/// thus can be used in commutative matchers.		/// thus can be used in commutative matchers.
template <typename Class> struct deferredval_ty {		template <typename Class> struct deferredval_ty {
Class *const &Val;		Class *const &Val;

deferredval_ty(Class *const &V) : Val(V) {}		deferredval_ty(Class *const &V) : Val(V) {}

template <typename ITy> bool match(ITy *const V) { return V == Val; }		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *const V, MatchContext & MContext) { return V == Val; }
};		};

/// A commutative-friendly version of m_Specific().		/// A commutative-friendly version of m_Specific().
inline deferredval_ty<Value> m_Deferred(Value *const &V) { return V; }		inline deferredval_ty<Value> m_Deferred(Value *const &V) { return V; }
inline deferredval_ty<const Value> m_Deferred(const Value *const &V) {		inline deferredval_ty<const Value> m_Deferred(const Value *const &V) {
return V;		return V;
}		}

/// Match a specified floating point value or vector of all elements of		/// Match a specified floating point value or vector of all elements of
/// that value.		/// that value.
struct specific_fpval {		struct specific_fpval {
double Val;		double Val;

specific_fpval(double V) : Val(V) {}		specific_fpval(double V) : Val(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CFP = dyn_cast<ConstantFP>(V))		if (const auto *CFP = dyn_cast<ConstantFP>(V))
return CFP->isExactlyValue(Val);		return CFP->isExactlyValue(Val);
if (V->getType()->isVectorTy())		if (V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
if (auto *CFP = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))		if (auto *CFP = dyn_cast_or_null<ConstantFP>(C->getSplatValue()))
return CFP->isExactlyValue(Val);		return CFP->isExactlyValue(Val);
return false;		return false;
}		}
};		};

/// Match a specific floating point value or vector with all elements		/// Match a specific floating point value or vector with all elements
/// equal to the value.		/// equal to the value.
inline specific_fpval m_SpecificFP(double V) { return specific_fpval(V); }		inline specific_fpval m_SpecificFP(double V) { return specific_fpval(V); }

/// Match a float 1.0 or vector with all elements equal to 1.0.		/// Match a float 1.0 or vector with all elements equal to 1.0.
inline specific_fpval m_FPOne() { return m_SpecificFP(1.0); }		inline specific_fpval m_FPOne() { return m_SpecificFP(1.0); }

struct bind_const_intval_ty {		struct bind_const_intval_ty {
uint64_t &VR;		uint64_t &VR;

bind_const_intval_ty(uint64_t &V) : VR(V) {}		bind_const_intval_ty(uint64_t &V) : VR(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
if (const auto *CV = dyn_cast<ConstantInt>(V))		if (const auto *CV = dyn_cast<ConstantInt>(V))
if (CV->getValue().ule(UINT64_MAX)) {		if (CV->getValue().ule(UINT64_MAX)) {
VR = CV->getZExtValue();		VR = CV->getZExtValue();
return true;		return true;
}		}
return false;		return false;
}		}
};		};

/// Match a specified integer value or vector of all elements of that		/// Match a specified integer value or vector of all elements of that
// value.		// value.
struct specific_intval {		struct specific_intval {
uint64_t Val;		uint64_t Val;

specific_intval(uint64_t V) : Val(V) {}		specific_intval(uint64_t V) : Val(V) {}

template <typename ITy> bool match(ITy *V) {		template <typename ITy> bool match(ITy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename ITy, typename MatchContext> bool match_context(ITy *V, MatchContext & MContext) {
const auto *CI = dyn_cast<ConstantInt>(V);		const auto *CI = dyn_cast<ConstantInt>(V);
if (!CI && V->getType()->isVectorTy())		if (!CI && V->getType()->isVectorTy())
if (const auto *C = dyn_cast<Constant>(V))		if (const auto *C = dyn_cast<Constant>(V))
CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());		CI = dyn_cast_or_null<ConstantInt>(C->getSplatValue());

return CI && CI->getValue() == Val;		return CI && CI->getValue() == Val;
}		}
};		};
Show All 13 Lines
struct AnyBinaryOp_match {		struct AnyBinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
AnyBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		AnyBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<BinaryOperator>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return (L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		auto * I = match_dyn_cast<MatchContext, BinaryOperator>(V);
(Commutable && L.match(I->getOperand(1)) &&		if (!I) return false;
R.match(I->getOperand(0)));
		if (!MContext.acceptInnerNode(I)) return false;

		MatchContext LRContext(MContext);
		if (L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext))) return true;
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline AnyBinaryOp_match<LHS, RHS> m_BinOp(const LHS &L, const RHS &R) {		inline AnyBinaryOp_match<LHS, RHS> m_BinOp(const LHS &L, const RHS &R) {
return AnyBinaryOp_match<LHS, RHS>(L, R);		return AnyBinaryOp_match<LHS, RHS>(L, R);
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Matchers for specific binary operators.		// Matchers for specific binary operators.
//		//

template <typename LHS_t, typename RHS_t, unsigned Opcode,		template <typename LHS_t, typename RHS_t, unsigned Opcode,
bool Commutable = false>		bool Commutable = false>
struct BinaryOp_match {		struct BinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
BinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		BinaryOp_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<BinaryOperator>(V);		auto * I = match_dyn_cast<MatchContext, const BinaryOperator>(V);
return (L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		if (I && I->getOpcode() == Opcode) {
(Commutable && L.match(I->getOperand(1)) &&		MatchContext LRContext(MContext);
R.match(I->getOperand(0)));		if (!MContext.acceptInnerNode(I)) return false;
		if (L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext))) return true;
		return false;
}		}
if (auto *CE = dyn_cast<ConstantExpr>(V))		if (auto *CE = dyn_cast<ConstantExpr>(V))
return CE->getOpcode() == Opcode &&		return CE->getOpcode() == Opcode &&
((L.match(CE->getOperand(0)) && R.match(CE->getOperand(1))) \|\|		((L.match(CE->getOperand(0)) && R.match(CE->getOperand(1))) \|\|
(Commutable && L.match(CE->getOperand(1)) &&		(Commutable && L.match(CE->getOperand(1)) &&
R.match(CE->getOperand(0))));		R.match(CE->getOperand(0))));
return false;		return false;
}		}
Show All 22 Lines	inline BinaryOp_match<LHS, RHS, Instruction::FSub> m_FSub(const LHS &L,
const RHS &R) {		const RHS &R) {
return BinaryOp_match<LHS, RHS, Instruction::FSub>(L, R);		return BinaryOp_match<LHS, RHS, Instruction::FSub>(L, R);
}		}

template <typename Op_t> struct FNeg_match {		template <typename Op_t> struct FNeg_match {
Op_t X;		Op_t X;

FNeg_match(const Op_t &Op) : X(Op) {}		FNeg_match(const Op_t &Op) : X(Op) {}
template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *FPMO = dyn_cast<FPMathOperator>(V);		auto *FPMO = dyn_cast<FPMathOperator>(V);
if (!FPMO \|\| FPMO->getOpcode() != Instruction::FSub)		if (!FPMO \|\| match_cast<MatchContext, const Operator>(V)->getOpcode() != Instruction::FSub)
return false;		return false;
if (FPMO->hasNoSignedZeros()) {		if (FPMO->hasNoSignedZeros()) {
// With 'nsz', any zero goes.		// With 'nsz', any zero goes.
if (!cstfp_pred_ty<is_any_zero_fp>().match(FPMO->getOperand(0)))		if (!cstfp_pred_ty<is_any_zero_fp>().match_context(FPMO->getOperand(0), MContext))
return false;		return false;
} else {		} else {
// Without 'nsz', we need fsub -0.0, X exactly.		// Without 'nsz', we need fsub -0.0, X exactly.
if (!cstfp_pred_ty<is_neg_zero_fp>().match(FPMO->getOperand(0)))		if (!cstfp_pred_ty<is_neg_zero_fp>().match_context(FPMO->getOperand(0), MContext))
return false;		return false;
}		}
return X.match(FPMO->getOperand(1));		return X.match_context(FPMO->getOperand(1), MContext);
}		}
};		};

/// Match 'fneg X' as 'fsub -0.0, X'.		/// Match 'fneg X' as 'fsub -0.0, X'.
template <typename OpTy>		template <typename OpTy>
inline FNeg_match<OpTy>		inline FNeg_match<OpTy>
m_FNeg(const OpTy &X) {		m_FNeg(const OpTy &X) {
return FNeg_match<OpTy>(X);		return FNeg_match<OpTy>(X);
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	template <typename LHS_t, typename RHS_t, unsigned Opcode,
unsigned WrapFlags = 0>		unsigned WrapFlags = 0>
struct OverflowingBinaryOp_match {		struct OverflowingBinaryOp_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

OverflowingBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS)		OverflowingBinaryOp_match(const LHS_t &LHS, const RHS_t &RHS)
: L(LHS), R(RHS) {}		: L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (auto *Op = dyn_cast<OverflowingBinaryOperator>(V)) {		if (auto *Op = dyn_cast<OverflowingBinaryOperator>(V)) {
if (Op->getOpcode() != Opcode)		if (Op->getOpcode() != Opcode)
return false;		return false;
if (WrapFlags & OverflowingBinaryOperator::NoUnsignedWrap &&		if (WrapFlags & OverflowingBinaryOperator::NoUnsignedWrap &&
!Op->hasNoUnsignedWrap())		!Op->hasNoUnsignedWrap())
return false;		return false;
if (WrapFlags & OverflowingBinaryOperator::NoSignedWrap &&		if (WrapFlags & OverflowingBinaryOperator::NoSignedWrap &&
!Op->hasNoSignedWrap())		!Op->hasNoSignedWrap())
return false;		return false;
return L.match(Op->getOperand(0)) && R.match(Op->getOperand(1));		return L.match_context(Op->getOperand(0), MContext) && R.match_context(Op->getOperand(1), MContext);
}		}
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline OverflowingBinaryOp_match<LHS, RHS, Instruction::Add,		inline OverflowingBinaryOp_match<LHS, RHS, Instruction::Add,
OverflowingBinaryOperator::NoSignedWrap>		OverflowingBinaryOperator::NoSignedWrap>
▲ Show 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
//		//
template <typename LHS_t, typename RHS_t, typename Predicate>		template <typename LHS_t, typename RHS_t, typename Predicate>
struct BinOpPred_match : Predicate {		struct BinOpPred_match : Predicate {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

BinOpPred_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		BinOpPred_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<Instruction>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return this->isOpType(I->getOpcode()) && L.match(I->getOperand(0)) &&		if (auto *I = match_dyn_cast<MatchContext, Instruction>(V))
R.match(I->getOperand(1));		return this->isOpType(I->getOpcode()) && L.match_context(I->getOperand(0), MContext) &&
		R.match_context(I->getOperand(1), MContext);
if (auto *CE = dyn_cast<ConstantExpr>(V))		if (auto *CE = dyn_cast<ConstantExpr>(V))
return this->isOpType(CE->getOpcode()) && L.match(CE->getOperand(0)) &&		return this->isOpType(CE->getOpcode()) && L.match(CE->getOperand(0)) &&
R.match(CE->getOperand(1));		R.match(CE->getOperand(1));
return false;		return false;
}		}
};		};

struct is_shift_op {		struct is_shift_op {
▲ Show 20 Lines • Show All 62 Lines • ▼ Show 20 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Class that matches exact binary ops.		// Class that matches exact binary ops.
//		//
template <typename SubPattern_t> struct Exact_match {		template <typename SubPattern_t> struct Exact_match {
SubPattern_t SubPattern;		SubPattern_t SubPattern;

Exact_match(const SubPattern_t &SP) : SubPattern(SP) {}		Exact_match(const SubPattern_t &SP) : SubPattern(SP) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if (auto *PEO = dyn_cast<PossiblyExactOperator>(V))		if (auto *PEO = dyn_cast<PossiblyExactOperator>(V))
return PEO->isExact() && SubPattern.match(V);		return PEO->isExact() && SubPattern.match_context(V, MContext);
return false;		return false;
}		}
};		};

template <typename T> inline Exact_match<T> m_Exact(const T &SubPattern) {		template <typename T> inline Exact_match<T> m_Exact(const T &SubPattern) {
return SubPattern;		return SubPattern;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Matchers for CmpInst classes		// Matchers for CmpInst classes
//		//

template <typename LHS_t, typename RHS_t, typename Class, typename PredicateTy,		template <typename LHS_t, typename RHS_t, typename Class, typename PredicateTy,
bool Commutable = false>		bool Commutable = false>
struct CmpClass_match {		struct CmpClass_match {
PredicateTy &Predicate;		PredicateTy &Predicate;
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
CmpClass_match(PredicateTy &Pred, const LHS_t &LHS, const RHS_t &RHS)		CmpClass_match(PredicateTy &Pred, const LHS_t &LHS, const RHS_t &RHS)
: Predicate(Pred), L(LHS), R(RHS) {}		: Predicate(Pred), L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *I = dyn_cast<Class>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
if ((L.match(I->getOperand(0)) && R.match(I->getOperand(1))) \|\|		if (auto *I = match_dyn_cast<MatchContext, Class>(V)) {
(Commutable && L.match(I->getOperand(1)) &&		if (!MContext.acceptInnerNode(I)) return false;
R.match(I->getOperand(0)))) {		MatchContext LRContext(MContext);
		if ((L.match_context(I->getOperand(0), LRContext) && R.match_context(I->getOperand(1), LRContext) && MContext.mergeContext(LRContext)) \|\|
		(Commutable && (L.match_context(I->getOperand(1), MContext) && R.match_context(I->getOperand(0), MContext)))) {
Predicate = I->getPredicate();		Predicate = I->getPredicate();
return true;		return true;
}		}
		}
return false;		return false;
}		}
};		};

template <typename LHS, typename RHS>		template <typename LHS, typename RHS>
inline CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>		inline CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>
m_Cmp(CmpInst::Predicate &Pred, const LHS &L, const RHS &R) {		m_Cmp(CmpInst::Predicate &Pred, const LHS &L, const RHS &R) {
return CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>(Pred, L, R);		return CmpClass_match<LHS, RHS, CmpInst, CmpInst::Predicate>(Pred, L, R);
Show All 16 Lines
//		//

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, unsigned Opcode> struct OneOps_match {		template <typename T0, unsigned Opcode> struct OneOps_match {
T0 Op1;		T0 Op1;

OneOps_match(const T0 &Op1) : Op1(Op1) {}		OneOps_match(const T0 &Op1) : Op1(Op1) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0));		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
		return Op1.match_context(I->getOperand(0), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, typename T1, unsigned Opcode> struct TwoOps_match {		template <typename T0, typename T1, unsigned Opcode> struct TwoOps_match {
T0 Op1;		T0 Op1;
T1 Op2;		T1 Op2;

TwoOps_match(const T0 &Op1, const T1 &Op2) : Op1(Op1), Op2(Op2) {}		TwoOps_match(const T0 &Op1, const T1 &Op2) : Op1(Op1), Op2(Op2) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0)) && Op2.match(I->getOperand(1));		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
		return Op1.match_context(I->getOperand(0), MContext) &&
		Op2.match_context(I->getOperand(1), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches instructions with Opcode and three operands.		/// Matches instructions with Opcode and three operands.
template <typename T0, typename T1, typename T2, unsigned Opcode>		template <typename T0, typename T1, typename T2, unsigned Opcode>
struct ThreeOps_match {		struct ThreeOps_match {
T0 Op1;		T0 Op1;
T1 Op2;		T1 Op2;
T2 Op3;		T2 Op3;

ThreeOps_match(const T0 &Op1, const T1 &Op2, const T2 &Op3)		ThreeOps_match(const T0 &Op1, const T1 &Op2, const T2 &Op3)
: Op1(Op1), Op2(Op2), Op3(Op3) {}		: Op1(Op1), Op2(Op2), Op3(Op3) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (V->getValueID() == Value::InstructionVal + Opcode) {		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
auto *I = cast<Instruction>(V);		auto *I = match_dyn_cast<MatchContext, Instruction>(V);
return Op1.match(I->getOperand(0)) && Op2.match(I->getOperand(1)) &&		if (I && I->getOpcode() == Opcode && MContext.acceptInnerNode(I)) {
Op3.match(I->getOperand(2));		return Op1.match_context(I->getOperand(0), MContext) &&
		Op2.match_context(I->getOperand(1), MContext) &&
		Op3.match_context(I->getOperand(2), MContext);
}		}
return false;		return false;
}		}
};		};

/// Matches SelectInst.		/// Matches SelectInst.
template <typename Cond, typename LHS, typename RHS>		template <typename Cond, typename LHS, typename RHS>
inline ThreeOps_match<Cond, LHS, RHS, Instruction::Select>		inline ThreeOps_match<Cond, LHS, RHS, Instruction::Select>
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
// Matchers for CastInst classes		// Matchers for CastInst classes
//		//

template <typename Op_t, unsigned Opcode> struct CastClass_match {		template <typename Op_t, unsigned Opcode> struct CastClass_match {
Op_t Op;		Op_t Op;

CastClass_match(const Op_t &OpMatch) : Op(OpMatch) {}		CastClass_match(const Op_t &OpMatch) : Op(OpMatch) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *O = dyn_cast<Operator>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
return O->getOpcode() == Opcode && Op.match(O->getOperand(0));		if (auto O = match_dyn_cast<MatchContext, Operator>(V))
		return O->getOpcode() == Opcode && MContext.acceptInnerNode(O) && Op.match_context(O->getOperand(0), MContext);
return false;		return false;
}		}
};		};

/// Matches BitCast.		/// Matches BitCast.
template <typename OpTy>		template <typename OpTy>
inline CastClass_match<OpTy, Instruction::BitCast> m_BitCast(const OpTy &Op) {		inline CastClass_match<OpTy, Instruction::BitCast> m_BitCast(const OpTy &Op) {
return CastClass_match<OpTy, Instruction::BitCast>(Op);		return CastClass_match<OpTy, Instruction::BitCast>(Op);
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines
// Matchers for control flow.		// Matchers for control flow.
//		//

struct br_match {		struct br_match {
BasicBlock *&Succ;		BasicBlock *&Succ;

br_match(BasicBlock *&Succ) : Succ(Succ) {}		br_match(BasicBlock *&Succ) : Succ(Succ) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *BI = dyn_cast<BranchInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		if (auto *BI = match_dyn_cast<MatchContext, BranchInst>(V))
if (BI->isUnconditional()) {		if (BI->isUnconditional()) {
Succ = BI->getSuccessor(0);		Succ = BI->getSuccessor(0);
return true;		return true;
}		}
return false;		return false;
}		}
};		};

inline br_match m_UnconditionalBr(BasicBlock *&Succ) { return br_match(Succ); }		inline br_match m_UnconditionalBr(BasicBlock *&Succ) { return br_match(Succ); }

template <typename Cond_t> struct brc_match {		template <typename Cond_t> struct brc_match {
Cond_t Cond;		Cond_t Cond;
BasicBlock &T, &F;		BasicBlock &T, &F;

brc_match(const Cond_t &C, BasicBlock &t, BasicBlock &f)		brc_match(const Cond_t &C, BasicBlock &t, BasicBlock &f)
: Cond(C), T(t), F(f) {}		: Cond(C), T(t), F(f) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (auto *BI = dyn_cast<BranchInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		if (auto *BI = match_dyn_cast<MatchContext, BranchInst>(V))
if (BI->isConditional() && Cond.match(BI->getCondition())) {		if (BI->isConditional() && Cond.match(BI->getCondition())) {
T = BI->getSuccessor(0);		T = BI->getSuccessor(0);
F = BI->getSuccessor(1);		F = BI->getSuccessor(1);
return true;		return true;
}		}
return false;		return false;
}		}
};		};
Show All 12 Lines
struct MaxMin_match {		struct MaxMin_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;

// The evaluation order is always stable, regardless of Commutability.		// The evaluation order is always stable, regardless of Commutability.
// The LHS is always matched first.		// The LHS is always matched first.
MaxMin_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}		MaxMin_match(const LHS_t &LHS, const RHS_t &RHS) : L(LHS), R(RHS) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
// Look for "(x pred y) ? x : y" or "(x pred y) ? y : x".		// Look for "(x pred y) ? x : y" or "(x pred y) ? y : x".
auto *SI = dyn_cast<SelectInst>(V);		auto *SI = match_dyn_cast<MatchContext, SelectInst>(V);
if (!SI)		if (!SI \|\| !MContext.acceptInnerNode(SI))
return false;		return false;
auto *Cmp = dyn_cast<CmpInst_t>(SI->getCondition());		auto *Cmp = match_dyn_cast<MatchContext, CmpInst_t>(SI->getCondition());
if (!Cmp)		if (!Cmp \|\| !MContext.acceptInnerNode(Cmp))
return false;		return false;
// At this point we have a select conditioned on a comparison. Check that		// At this point we have a select conditioned on a comparison. Check that
// it is the values returned by the select that are being compared.		// it is the values returned by the select that are being compared.
Value *TrueVal = SI->getTrueValue();		Value *TrueVal = SI->getTrueValue();
Value *FalseVal = SI->getFalseValue();		Value *FalseVal = SI->getFalseValue();
Value *LHS = Cmp->getOperand(0);		Value *LHS = Cmp->getOperand(0);
Value *RHS = Cmp->getOperand(1);		Value *RHS = Cmp->getOperand(1);
if ((TrueVal != LHS \|\| FalseVal != RHS) &&		if ((TrueVal != LHS \|\| FalseVal != RHS) &&
(TrueVal != RHS \|\| FalseVal != LHS))		(TrueVal != RHS \|\| FalseVal != LHS))
return false;		return false;
typename CmpInst_t::Predicate Pred =		typename CmpInst_t::Predicate Pred =
LHS == TrueVal ? Cmp->getPredicate() : Cmp->getInversePredicate();		LHS == TrueVal ? Cmp->getPredicate() : Cmp->getInversePredicate();
// Does "(x pred y) ? x : y" represent the desired max/min operation?		// Does "(x pred y) ? x : y" represent the desired max/min operation?
if (!Pred_t::match(Pred))		if (!Pred_t::match(Pred))
return false;		return false;

// It does! Bind the operands.		// It does! Bind the operands.
return (L.match(LHS) && R.match(RHS)) \|\|		MatchContext LRContext(MContext);
(Commutable && L.match(RHS) && R.match(LHS));		if (L.match_context(LHS, LRContext) && R.match_context(RHS, LRContext) && MContext.mergeContext(LRContext)) return true;
		if (Commutable && (L.match_context(RHS, MContext) && R.match_context(LHS, MContext))) return true;
		return false;
}		}
};		};

/// Helper class for identifying signed max predicates.		/// Helper class for identifying signed max predicates.
struct smax_pred_ty {		struct smax_pred_ty {
static bool match(ICmpInst::Predicate Pred) {		static bool match(ICmpInst::Predicate Pred) {
return Pred == CmpInst::ICMP_SGT \|\| Pred == CmpInst::ICMP_SGE;		return Pred == CmpInst::ICMP_SGT \|\| Pred == CmpInst::ICMP_SGE;
}		}
▲ Show 20 Lines • Show All 140 Lines • ▼ Show 20 Lines
struct UAddWithOverflow_match {		struct UAddWithOverflow_match {
LHS_t L;		LHS_t L;
RHS_t R;		RHS_t R;
Sum_t S;		Sum_t S;

UAddWithOverflow_match(const LHS_t &L, const RHS_t &R, const Sum_t &S)		UAddWithOverflow_match(const LHS_t &L, const RHS_t &R, const Sum_t &S)
: L(L), R(R), S(S) {}		: L(L), R(R), S(S) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
Value ICmpLHS, ICmpRHS;		Value ICmpLHS, ICmpRHS;
ICmpInst::Predicate Pred;		ICmpInst::Predicate Pred;
if (!m_ICmp(Pred, m_Value(ICmpLHS), m_Value(ICmpRHS)).match(V))		if (!m_ICmp(Pred, m_Value(ICmpLHS), m_Value(ICmpRHS)).match(V))
return false;		return false;

Value AddLHS, AddRHS;		Value AddLHS, AddRHS;
auto AddExpr = m_Add(m_Value(AddLHS), m_Value(AddRHS));		auto AddExpr = m_Add(m_Value(AddLHS), m_Value(AddRHS));

Show All 36 Lines
}		}

template <typename Opnd_t> struct Argument_match {		template <typename Opnd_t> struct Argument_match {
unsigned OpI;		unsigned OpI;
Opnd_t Val;		Opnd_t Val;

Argument_match(unsigned OpIdx, const Opnd_t &V) : OpI(OpIdx), Val(V) {}		Argument_match(unsigned OpIdx, const Opnd_t &V) : OpI(OpIdx), Val(V) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
// FIXME: Should likely be switched to use `CallBase`.		// FIXME: Should likely be switched to use `CallBase`.
if (const auto *CI = dyn_cast<CallInst>(V))		if (const auto *CI = match_dyn_cast<MatchContext, CallInst>(V))
return Val.match(CI->getArgOperand(OpI));		return Val.match(CI->getArgOperand(OpI));
return false;		return false;
}		}
};		};

/// Match an argument.		/// Match an argument.
template <unsigned OpI, typename Opnd_t>		template <unsigned OpI, typename Opnd_t>
inline Argument_match<Opnd_t> m_Argument(const Opnd_t &Op) {		inline Argument_match<Opnd_t> m_Argument(const Opnd_t &Op) {
return Argument_match<Opnd_t>(OpI, Op);		return Argument_match<Opnd_t>(OpI, Op);
}		}

/// Intrinsic matchers.		/// Intrinsic matchers.
struct IntrinsicID_match {		struct IntrinsicID_match {
unsigned ID;		unsigned ID;

IntrinsicID_match(Intrinsic::ID IntrID) : ID(IntrID) {}		IntrinsicID_match(Intrinsic::ID IntrID) : ID(IntrID) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
if (const auto *CI = dyn_cast<CallInst>(V))		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
		if (const auto *CI = match_dyn_cast<MatchContext, CallInst>(V))
if (const auto *F = CI->getCalledFunction())		if (const auto *F = CI->getCalledFunction())
return F->getIntrinsicID() == ID;		return F->getIntrinsicID() == ID;
return false;		return false;
}		}
};		};

/// Intrinsic matches are combinations of ID matchers, and argument		/// Intrinsic matches are combinations of ID matchers, and argument
/// matchers. Higher arity matcher are defined recursively in terms of and-ing		/// matchers. Higher arity matcher are defined recursively in terms of and-ing
▲ Show 20 Lines • Show All 193 Lines • ▼ Show 20 Lines
m_c_FMul(const LHS &L, const RHS &R) {		m_c_FMul(const LHS &L, const RHS &R) {
return BinaryOp_match<LHS, RHS, Instruction::FMul, true>(L, R);		return BinaryOp_match<LHS, RHS, Instruction::FMul, true>(L, R);
}		}

template <typename Opnd_t> struct Signum_match {		template <typename Opnd_t> struct Signum_match {
Opnd_t Val;		Opnd_t Val;
Signum_match(const Opnd_t &V) : Val(V) {}		Signum_match(const Opnd_t &V) : Val(V) {}

template <typename OpTy> bool match(OpTy *V) {		template <typename OpTy> bool match(OpTy *V) { EmptyContext EContext; return match_context(V, EContext); }
		template <typename OpTy, typename MatchContext> bool match_context(OpTy *V, MatchContext & MContext) {
unsigned TypeSize = V->getType()->getScalarSizeInBits();		unsigned TypeSize = V->getType()->getScalarSizeInBits();
if (TypeSize == 0)		if (TypeSize == 0)
return false;		return false;

unsigned ShiftWidth = TypeSize - 1;		unsigned ShiftWidth = TypeSize - 1;
Value OpL = nullptr, OpR = nullptr;		Value OpL = nullptr, OpR = nullptr;

// This is the representation of signum we match:		// This is the representation of signum we match:
Show All 31 Lines

include/llvm/IR/PredicatedInst.h

This file was added.

				//===-- llvm/PredicatedInst.h - Predication utility subclass --- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines various classes for working with Predicated Instructions.
				// Predicated instructions are either regular instructions or calls to
				// Vector Predication (VP) intrinsics that have a mask and an explicit
				// vector length argument.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_IR_PREDICATEDINST_H
				#define LLVM_IR_PREDICATEDINST_H

				#include "llvm/ADT/None.h"
				#include "llvm/ADT/Optional.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/Instruction.h"
				#include "llvm/IR/Type.h"
				#include "llvm/IR/Value.h"
				#include "llvm/Support/Casting.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/Operator.h"
				#include "llvm/IR/MatcherCast.h"

				#include <cstddef>

				namespace llvm {

				class BasicBlock;

				class PredicatedInstruction : public User {
				public:
				// The PredicatedInstruction class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedInstruction() = delete;
				~PredicatedInstruction() = delete;

				void copyIRFlags(const Value * V, bool IncludeWrapFlags) {
				cast<Instruction>(this)->copyIRFlags(V, IncludeWrapFlags);
				}

				BasicBlock* getParent() { return cast<Instruction>(this)->getParent(); }
				const BasicBlock* getParent() const { return cast<const Instruction>(this)->getParent(); }

				void *operator new(size_t s) = delete;

				Value* getMask() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP) return nullptr;
				return thisVP->getMask();
				}

				Value* getVectorLength() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP) return nullptr;
				return thisVP->getVectorLength();
				}

				unsigned getOpcode() const {
				auto * VPInst = dyn_cast<VPIntrinsic>(this);
				if (VPInst)
				return VPInst->getFunctionalOpcode();
				return cast<Instruction>(this)->getOpcode();
				}

				#if 0
				operator Instruction() { return cast<Value>(this); }
				operator const Value() const { return cast<const Value>(this); }
				#endif

				static bool classof(const Instruction * I) { return isa<Instruction>(I); }
				static bool classof(const ConstantExpr * CE) { return false; }
				static bool classof(const Value *V) { return isa<Instruction>(V); }
				};

				class PredicatedOperator : public User {
				public:
				// The PredicatedOperator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedOperator() = delete;
				~PredicatedOperator() = delete;

				void *operator new(size_t s) = delete;

				#if 0
				operator Value*() { return cast<Value>(this); }
				operator const Value*() const { return cast<const Value>(this); }
				#endif

				/// Return the opcode for this Instruction or ConstantExpr.
				unsigned getOpcode() const {
				auto * VPInst = dyn_cast<VPIntrinsic>(this);
				if (VPInst)
				return VPInst->getFunctionalOpcode();
				if (const Instruction *I = dyn_cast<Instruction>(this))
				return I->getOpcode();
				return cast<ConstantExpr>(this)->getOpcode();
				}

				Value* getMask() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP) return nullptr;
				return thisVP->getMask();
				}

				Value* getVectorLength() const {
				auto thisVP = dyn_cast<VPIntrinsic>(this);
				if (!thisVP) return nullptr;
				return thisVP->getVectorLength();
				}

				void copyIRFlags(const Value * V, bool IncludeWrapFlags = true);
				FastMathFlags getFastMathFlags() const {
				auto * I = dyn_cast<Instruction>(this);
				if (I) return I->getFastMathFlags();
				else return FastMathFlags();
				}

				static bool classof(const Instruction * I) { return isa<VPIntrinsic>(I) \|\| isa<Operator>(I); }
				static bool classof(const ConstantExpr * CE) { return isa<Operator>(CE); }
				static bool classof(const Value *V) { return isa<VPIntrinsic>(V) \|\| isa<Operator>(V); }
				};

				class PredicatedBinaryOperator : public PredicatedOperator {
				public:
				// The PredicatedBinaryOperator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedBinaryOperator() = delete;
				~PredicatedBinaryOperator() = delete;

				using BinaryOps = Instruction::BinaryOps;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction * I) {
				if (isa<BinaryOperator>(I)) return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->isBinaryOp();
				}
				static bool classof(const ConstantExpr * CE) { return isa<BinaryOperator>(CE); }
				static bool classof(const Value *V) {
				auto * I = dyn_cast<Instruction>(V);
				if (I && classof(I)) return true;
				auto * CE = dyn_cast<ConstantExpr>(V);
				return CE && classof(CE);
				}

				/// Construct a predicated binary instruction, given the opcode and the two
				/// operands.
				static Instruction* Create(Module * Mod,
				Value Mask, Value VectorLen,
				Instruction::BinaryOps Opc,
				Value V1, Value V2,
				const Twine &Name,
				BasicBlock * InsertAtEnd,
				Instruction * InsertBefore);

				static Instruction* Create(Module *Mod,
				Value Mask, Value VectorLen,
				BinaryOps Opc,
				Value V1, Value V2,
				const Twine &Name = Twine(),
				Instruction *InsertBefore = nullptr) {
				return Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, nullptr, InsertBefore);
				}

				static Instruction* Create(Module *Mod,
				Value Mask, Value VectorLen,
				BinaryOps Opc,
				Value V1, Value V2,
				const Twine &Name,
				BasicBlock *InsertAtEnd) {
				return Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, InsertAtEnd, nullptr);
				}

				static Instruction* CreateWithCopiedFlags(Module *Mod,
				Value Mask, Value VectorLen,
				BinaryOps Opc,
				Value V1, Value V2,
				Instruction *CopyBO,
				const Twine &Name = "") {
				Instruction *BO = Create(Mod, Mask, VectorLen, Opc, V1, V2, Name, nullptr, nullptr);
				BO->copyIRFlags(CopyBO);
				return BO;
				}
				};

				class PredicatedICmpInst : public PredicatedBinaryOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedICmpInst() = delete;
				~PredicatedICmpInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction * I) {
				if (isa<ICmpInst>(I)) return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::ICmp;
				}
				static bool classof(const ConstantExpr * CE) { return CE->getOpcode() == Instruction::ICmp; }
				static bool classof(const Value *V) {
				auto * I = dyn_cast<Instruction>(V);
				if (I && classof(I)) return true;
				auto * CE = dyn_cast<ConstantExpr>(V);
				return CE && classof(CE);
				}

				ICmpInst::Predicate getPredicate() const {
				auto * ICInst = dyn_cast<const ICmpInst>(this);
				if (ICInst) return ICInst->getPredicate();
				auto * CE = dyn_cast<const ConstantExpr>(this);
				if (CE) return static_cast<ICmpInst::Predicate>(CE->getPredicate());
				return static_cast<ICmpInst::Predicate>(cast<VPIntrinsic>(this)->getCmpPredicate());
				}
				};


				class PredicatedFCmpInst : public PredicatedBinaryOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedFCmpInst() = delete;
				~PredicatedFCmpInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction * I) {
				if (isa<FCmpInst>(I)) return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::FCmp;
				}
				static bool classof(const ConstantExpr * CE) { return CE->getOpcode() == Instruction::FCmp; }
				static bool classof(const Value *V) {
				auto * I = dyn_cast<Instruction>(V);
				if (I && classof(I)) return true;
				return isa<ConstantExpr>(V);
				}

				FCmpInst::Predicate getPredicate() const {
				auto * FCInst = dyn_cast<const FCmpInst>(this);
				if (FCInst) return FCInst->getPredicate();
				auto * CE = dyn_cast<const ConstantExpr>(this);
				if (CE) return static_cast<FCmpInst::Predicate>(CE->getPredicate());
				return static_cast<FCmpInst::Predicate>(cast<VPIntrinsic>(this)->getCmpPredicate());
				}
				};


				class PredicatedSelectInst : public PredicatedOperator {
				public:
				// The Operator class is intended to be used as a utility, and is never itself
				// instantiated.
				PredicatedSelectInst() = delete;
				~PredicatedSelectInst() = delete;

				void *operator new(size_t s) = delete;

				static bool classof(const Instruction * I) {
				if (isa<SelectInst>(I)) return true;
				auto VPInst = dyn_cast<VPIntrinsic>(I);
				return VPInst && VPInst->getFunctionalOpcode() == Instruction::Select;
				}
				static bool classof(const ConstantExpr * CE) { return CE->getOpcode() == Instruction::Select; }
				static bool classof(const Value *V) {
				auto * I = dyn_cast<Instruction>(V);
				if (I && classof(I)) return true;
				auto * CE = dyn_cast<ConstantExpr>(V);
				return CE && CE->getOpcode() == Instruction::Select;
				}

				const Value *getCondition() const { return getOperand(0); }
				const Value *getTrueValue() const { return getOperand(1); }
				const Value *getFalseValue() const { return getOperand(2); }
				Value *getCondition() { return getOperand(0); }
				Value *getTrueValue() { return getOperand(1); }
				Value *getFalseValue() { return getOperand(2); }

				void setCondition(Value *V) { setOperand(0, V); }
				void setTrueValue(Value *V) { setOperand(1, V); }
				void setFalseValue(Value *V) { setOperand(2, V); }
				};


				namespace PatternMatch {

				// PredicatedMatchContext for pattern matching
				struct PredicatedContext {
				Value * Mask;
				Value * VectorLength;
				Module * Mod;

				void reset(Value * V) {
				auto * PI = dyn_cast<PredicatedInstruction>(V);
				if (!PI) {
				VectorLength = nullptr;
				Mask = nullptr;
				Mod = nullptr;
				} else {
				VectorLength = PI->getVectorLength();
				Mask = PI->getMask();
				Mod = PI->getParent()->getParent()->getParent();
				}
				}

				PredicatedContext(Value * Val)
				: Mask(nullptr)
				, VectorLength(nullptr)
				, Mod(nullptr)
				{
				reset(Val);
				}

				PredicatedContext(const PredicatedContext & PC)
				: Mask(PC.Mask)
				, VectorLength(PC.VectorLength)
				, Mod(PC.Mod)
				{}

				// accept a match where \p Val is in a non-leaf position in a match pattern
				bool acceptInnerNode(const Value * Val) const {
				auto PredI = dyn_cast<PredicatedInstruction>(Val);
				if (!PredI) return VectorLength == nullptr && Mask == nullptr;
				return VectorLength == PredI->getVectorLength() && Mask == PredI->getMask();
				}

				// accept a match where \p Val is bound to a free variable.
				bool acceptBoundNode(const Value * Val) const { return true; }

				// whether this context is compatiable with \p E.
				bool acceptContext(PredicatedContext PC) const {
				return PC.Mask == Mask && PC.VectorLength == VectorLength;
				}

				// merge the context \p E into this context and return whether the resulting context is valid.
				bool mergeContext(PredicatedContext PC) const { return acceptContext(PC); }

				// match \p P in a new contest for \p Val.
				template <typename Val, typename Pattern> bool reset_match(Val *V, const Pattern &P) {
				reset(V);
				return const_cast<Pattern &>(P).match_context(V, *this);
				}

				// match \p P in the current context.
				template <typename Val, typename Pattern> bool try_match(Val *V, const Pattern &P) {
				PredicatedContext SubContext(*this);
				return const_cast<Pattern &>(P).match_context(V, SubContext);
				}
				};

				struct PredicatedContext;
				template<> struct MatcherCast<PredicatedContext, BinaryOperator> { using ActualCastType = PredicatedBinaryOperator; };
				template<> struct MatcherCast<PredicatedContext, Operator> { using ActualCastType = PredicatedOperator; };
				template<> struct MatcherCast<PredicatedContext, ICmpInst> { using ActualCastType = PredicatedICmpInst; };
				template<> struct MatcherCast<PredicatedContext, FCmpInst> { using ActualCastType = PredicatedFCmpInst; };
				template<> struct MatcherCast<PredicatedContext, SelectInst> { using ActualCastType = PredicatedSelectInst; };
				template<> struct MatcherCast<PredicatedContext, Instruction> { using ActualCastType = PredicatedInstruction; };

				} // namespace PatternMatch

				} // namespace llvm

				#endif // LLVM_IR_PREDICATEDINST_H

include/llvm/IR/VPBuilder.h

This file was added.

				#ifndef LLVM_IR_VPBUILDER_H
				#define LLVM_IR_VPBUILDER_H

				#include <llvm/IR/IRBuilder.h>
				#include <llvm/IR/Value.h>
				#include <llvm/IR/Instruction.h>
				#include <llvm/IR/InstrTypes.h>
				#include <llvm/IR/PredicatedInst.h>
				#include <llvm/IR/PatternMatch.h>

				namespace llvm {

				using ValArray = ArrayRef<Value*>;

				class VPBuilder {
				IRBuilder<> & Builder;
				// Explicit mask parameter
				Value * Mask;
				// Explicit vector length parameter
				Value * ExplicitVectorLength;
				// Compile-time vector length
				int StaticVectorLength;

				// get a vlaid mask/evl argument for the current predication contet
				Value& GetMaskForType(VectorType & VecTy);
				Value& GetEVLForType(VectorType & VecTy);

				public:
				VPBuilder(IRBuilder<> & _builder)
				: Builder(_builder)
				, Mask(nullptr)
				, ExplicitVectorLength(nullptr)
				, StaticVectorLength(-1)
				{}

				Module & getModule() const;
				LLVMContext & getContext() const { return Builder.getContext(); }

				// The cannonical vector type for this \p ElementTy
				VectorType& getVectorType(Type &ElementTy);

				// Predication context tracker
				VPBuilder& setMask(Value * _Mask) { Mask = _Mask; return *this; }
				VPBuilder& setEVL(Value * _ExplicitVectorLength) { ExplicitVectorLength = _ExplicitVectorLength; return *this; }
				VPBuilder& setStaticVL(int VLen) { StaticVectorLength = VLen; return *this; }

				VPIntrinsic::VPIntrinsicDesc GetVPIntrinsicDesc(unsigned OC);

				// Create a map-vectorized copy of the instruction \p Inst with the underlying IRBuilder instance.
				// This operation may return nullptr if the instruction could not be vectorized.
				Value* CreateVectorCopy(Instruction & Inst, ValArray VecOpArray);

				// Memory
				Value& CreateContiguousStore(Value & Val, Value & Pointer, unsigned Alignment=0);
				Value& CreateContiguousLoad(Value & Pointer, unsigned Alignment=0);
				Value& CreateScatter(Value & Val, Value & PointerVec, unsigned Alignment=0);
				Value& CreateGather(Value & PointerVec, unsigned Alignment=0);
				};





				namespace PatternMatch {
				// Factory class to generate instructions in a context
				template<typename MatcherContext>
				class MatchContextBuilder {
				public:
				// MatchContextBuilder(MatcherContext MC);
				};


				// Context-free instruction builder
				template<>
				class MatchContextBuilder<EmptyContext> {
				public:
				MatchContextBuilder(EmptyContext & EC) {}

				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name = "") const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name);\
				} \
				template<typename IRBuilderType> \
				Value Create##OPC(IRBuilderType & Builder, Value V1, Value *V2, \
				const Twine &Name = "") const { \
				auto * Inst = BinaryOperator::Create(Instruction::OPC, V1, V2, Name); \
				Builder.Insert(Inst); return Inst; \
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Value Create##OPC(Value V1, Value *V2, \
				const Twine &Name, BasicBlock *BB) const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name, BB);\
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Value Create##OPC(Value V1, Value *V2, \
				const Twine &Name, Instruction *I) const {\
				return BinaryOperator::Create(Instruction::OPC, V1, V2, Name, I);\
				}
				#include "llvm/IR/Instruction.def"
				#undef HANDLE_BINARY_INST

				BinaryOperator CreateFAddFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FAdd, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFSubFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FSub, V1, V2, FMFSource, Name);
				}
				template<typename IRBuilderType>
				BinaryOperator CreateFSubFMF(IRBuilderType & Builder, Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				auto * Inst = CreateFSubFMF(V1, V2, FMFSource, Name);
				Builder.Insert(Inst); return Inst;
				}
				BinaryOperator CreateFMulFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FMul, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFDivFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FDiv, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFRemFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FRem, V1, V2, FMFSource, Name);
				}
				BinaryOperator CreateFNegFMF(Value Op, Instruction *FMFSource,
				const Twine &Name = "") {
				Value *Zero = ConstantFP::getNegativeZero(Op->getType());
				return BinaryOperator::CreateWithCopiedFlags(Instruction::FSub, Zero, Op, FMFSource);
				}

				template<typename IRBuilderType>
				Value CreateFPTrunc(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPTrunc(V, DestTy, Name); }
				template<typename IRBuilderType>
				Value CreateFPExt(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPExt(V, DestTy, Name); }
				};



				// Context-free instruction builder
				template<>
				class MatchContextBuilder<PredicatedContext> {
				PredicatedContext & PC;
				public:
				MatchContextBuilder(PredicatedContext & PC) : PC(PC) {}

				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name = "") const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name);\
				} \
				template<typename IRBuilderType> \
				Instruction Create##OPC(IRBuilderType & Builder, Value V1, Value *V2, \
				const Twine &Name = "") const {\
				auto * PredInst = Create##OPC(V1, V2, Name); Builder.Insert(PredInst); return PredInst; \
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name, BasicBlock *BB) const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name, BB);\
				}
				#include "llvm/IR/Instruction.def"
				#define HANDLE_BINARY_INST(N, OPC, CLASS) \
				Instruction Create##OPC(Value V1, Value *V2, \
				const Twine &Name, Instruction *I) const {\
				return PredicatedBinaryOperator::Create(PC.Mod, PC.Mask, PC.VectorLength, Instruction::OPC, V1, V2, Name, I);\
				}
				#include "llvm/IR/Instruction.def"
				#undef HANDLE_BINARY_INST

				Instruction CreateFAddFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FAdd, V1, V2, FMFSource, Name);
				}
				Instruction CreateFSubFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FSub, V1, V2, FMFSource, Name);
				}
				template<typename IRBuilderType>
				Instruction CreateFSubFMF(IRBuilderType & Builder, Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				auto * Inst = CreateFSubFMF(V1, V2, FMFSource, Name);
				Builder.Insert(Inst); return Inst;
				}
				Instruction CreateFMulFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FMul, V1, V2, FMFSource, Name);
				}
				Instruction CreateFDivFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FDiv, V1, V2, FMFSource, Name);
				}
				Instruction CreateFRemFMF(Value V1, Value *V2,
				Instruction *FMFSource,
				const Twine &Name = "") {
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FRem, V1, V2, FMFSource, Name);
				}
				Instruction CreateFNegFMF(Value Op, Instruction *FMFSource,
				const Twine &Name = "") {
				Value *Zero = ConstantFP::getNegativeZero(Op->getType());
				return PredicatedBinaryOperator::CreateWithCopiedFlags(PC.Mod, PC.Mask, PC.VectorLength, Instruction::FSub, Zero, Op, FMFSource);
				}

				// TODO predicated casts
				template<typename IRBuilderType>
				Value CreateFPTrunc(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPTrunc(V, DestTy, Name); }
				template<typename IRBuilderType>
				Value CreateFPExt(IRBuilderType & Builder, Value V, Type *DestTy, const Twine & Name = Twine()) { return Builder.CreateFPExt(V, DestTy, Name); }
				};

				}

				} // namespace llvm

				#endif // LLVM_IR_VPBUILDER_H

include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
]>;		]>;
def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem		def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
]>;		]>;
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix		def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix, umulfix
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
]>;		]>;

		def SDTIntBinOpVP : SDTypeProfile<1, 4, [ // vp_add, vp_and, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTIntShiftOpVP : SDTypeProfile<1, 4, [ // shl, sra, srl
		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisInt<2>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;

def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.		def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
Show All 26 Lines	def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg
SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,
SDTCisVTSmallerThanOp<2, 1>		SDTCisVTSmallerThanOp<2, 1>
]>;		]>;
def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec		def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec
SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,		SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,
SDTCisOpSmallerThanOp<1, 0>		SDTCisOpSmallerThanOp<1, 0>
]>;		]>;

		def SDTFPUnOpVP : SDTypeProfile<1, 3, [ // vp_fneg, etc.
		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisInt<3>, SDTCisSameNumEltsAs<0, 2>
		]>;
		def SDTFPBinOpVP : SDTypeProfile<1, 4, [ // vp_fadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTFPTernaryOpVP : SDTypeProfile<1, 5, [ // vp_fmadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSetCC : SDTypeProfile<1, 3, [ // setcc		def SDTSetCC : SDTypeProfile<1, 3, [ // setcc
SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>		SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>
]>;		]>;

def SDTSelect : SDTypeProfile<1, 3, [ // select		def SDTSelect : SDTypeProfile<1, 3, [ // select
SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>		SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>
]>;		]>;

def SDTVSelect : SDTypeProfile<1, 3, [ // vselect		def SDTVSelect : SDTypeProfile<1, 3, [ // vselect
SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;

		def SDTVSelectVP : SDTypeProfile<1, 5, [ // vp_vselect
		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc		def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc
SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,		SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,
SDTCisVT<5, OtherVT>		SDTCisVT<5, OtherVT>
]>;		]>;

def SDTBr : SDTypeProfile<0, 1, [ // br		def SDTBr : SDTypeProfile<0, 1, [ // br
SDTCisVT<0, OtherVT>		SDTCisVT<0, OtherVT>
]>;		]>;
Show All 27 Lines
def SDTIStore : SDTypeProfile<1, 3, [ // indexed store		def SDTIStore : SDTypeProfile<1, 3, [ // indexed store
SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>		SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>
]>;		]>;

def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store		def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTStoreVP: SDTypeProfile<0, 4, [ // evl store
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>
		]>;

def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load		def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,
SDTCisSameNumEltsAs<0, 2>		SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTLoadVP : SDTypeProfile<1, 3, [ // evl load
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>,
		SDTCisSameNumEltsAs<0, 2>
		]>;

def SDTVecShuffle : SDTypeProfile<1, 2, [		def SDTVecShuffle : SDTypeProfile<1, 2, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>		SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>
]>;		]>;
def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract		def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract
SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>		SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>
]>;		]>;
def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert		def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert
SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>		SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	def smin : SDNode<"ISD::SMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,		def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,		def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,		def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;

		def vp_and : SDNode<"ISD::VP_AND" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_or : SDNode<"ISD::VP_OR" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_xor : SDNode<"ISD::VP_XOR" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_srl : SDNode<"ISD::VP_SRL" , SDTIntShiftOpVP>;
		def vp_sra : SDNode<"ISD::VP_SRA" , SDTIntShiftOpVP>;
		def vp_shl : SDNode<"ISD::VP_SHL" , SDTIntShiftOpVP>;

		def vp_add : SDNode<"ISD::VP_ADD" , SDTIntBinOpVP ,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_sub : SDNode<"ISD::VP_SUB" , SDTIntBinOpVP>;
		def vp_mul : SDNode<"ISD::VP_MUL" , SDTIntBinOpVP,
		[SDNPCommutative, SDNPAssociative]>;
		def vp_sdiv : SDNode<"ISD::VP_SDIV" , SDTIntBinOpVP>;
		def vp_udiv : SDNode<"ISD::VP_UDIV" , SDTIntBinOpVP>;
		def vp_srem : SDNode<"ISD::VP_SREM" , SDTIntBinOpVP>;
		def vp_urem : SDNode<"ISD::VP_UREM" , SDTIntBinOpVP>;

def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;		def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;
def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;		def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;

def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;
def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def umulfix : SDNode<"ISD::UMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;

▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
def ffloor : SDNode<"ISD::FFLOOR" , SDTFPUnaryOp>;		def ffloor : SDNode<"ISD::FFLOOR" , SDTFPUnaryOp>;
def fnearbyint : SDNode<"ISD::FNEARBYINT" , SDTFPUnaryOp>;		def fnearbyint : SDNode<"ISD::FNEARBYINT" , SDTFPUnaryOp>;
def fround : SDNode<"ISD::FROUND" , SDTFPUnaryOp>;		def fround : SDNode<"ISD::FROUND" , SDTFPUnaryOp>;

def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;		def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;
def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;		def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;
def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;		def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;

		def vp_fneg : SDNode<"ISD::VP_FNEG" , SDTFPUnOpVP>;
		def vp_fadd : SDNode<"ISD::VP_FADD" , SDTFPBinOpVP, [SDNPCommutative]>;
		def vp_fsub : SDNode<"ISD::VP_FSUB" , SDTFPBinOpVP>;
		def vp_fmul : SDNode<"ISD::VP_FMUL" , SDTFPBinOpVP, [SDNPCommutative]>;
		def vp_fdiv : SDNode<"ISD::VP_FDIV" , SDTFPBinOpVP>;
		def vp_frem : SDNode<"ISD::VP_FREM" , SDTFPBinOpVP>;
		def vp_fma : SDNode<"ISD::VP_FMA" , SDTFPTernaryOpVP>;

def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;		def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;
def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;		def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;
def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;		def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;
def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;		def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;
def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;		def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;
def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;		def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;

def setcc : SDNode<"ISD::SETCC" , SDTSetCC>;		def setcc : SDNode<"ISD::SETCC" , SDTSetCC>;
def select : SDNode<"ISD::SELECT" , SDTSelect>;		def select : SDNode<"ISD::SELECT" , SDTSelect>;
def vselect : SDNode<"ISD::VSELECT" , SDTVSelect>;		def vselect : SDNode<"ISD::VSELECT" , SDTVSelect>;
def selectcc : SDNode<"ISD::SELECT_CC" , SDTSelectCC>;		def selectcc : SDNode<"ISD::SELECT_CC" , SDTSelectCC>;

def brcc : SDNode<"ISD::BR_CC" , SDTBrCC, [SDNPHasChain]>;		def brcc : SDNode<"ISD::BR_CC" , SDTBrCC, [SDNPHasChain]>;
def brcond : SDNode<"ISD::BRCOND" , SDTBrcond, [SDNPHasChain]>;		def brcond : SDNode<"ISD::BRCOND" , SDTBrcond, [SDNPHasChain]>;
def brind : SDNode<"ISD::BRIND" , SDTBrind, [SDNPHasChain]>;		def brind : SDNode<"ISD::BRIND" , SDTBrind, [SDNPHasChain]>;
def br : SDNode<"ISD::BR" , SDTBr, [SDNPHasChain]>;		def br : SDNode<"ISD::BR" , SDTBr, [SDNPHasChain]>;
def catchret : SDNode<"ISD::CATCHRET" , SDTCatchret,		def catchret : SDNode<"ISD::CATCHRET" , SDTCatchret,
[SDNPHasChain, SDNPSideEffect]>;		[SDNPHasChain, SDNPSideEffect]>;
def cleanupret : SDNode<"ISD::CLEANUPRET" , SDTNone, [SDNPHasChain]>;		def cleanupret : SDNode<"ISD::CLEANUPRET" , SDTNone, [SDNPHasChain]>;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,		def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

def masked_store : SDNode<"ISD::MSTORE", SDTMaskedStore,		def masked_store : SDNode<"ISD::MSTORE", SDTMaskedStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def masked_load : SDNode<"ISD::MLOAD", SDTMaskedLoad,		def masked_load : SDNode<"ISD::MLOAD", SDTMaskedLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

		def vp_store : SDNode<"ISD::VP_STORE", SDTStoreVP,
		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
		def vp_load : SDNode<"ISD::VP_LOAD", SDTLoadVP,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

// Do not use ld, st directly. Use load, extload, sextload, zextload, store,		// Do not use ld, st directly. Use load, extload, sextload, zextload, store,
// and truncst (see below).		// and truncst (see below).
def ld : SDNode<"ISD::LOAD" , SDTLoad,		def ld : SDNode<"ISD::LOAD" , SDTLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def st : SDNode<"ISD::STORE" , SDTStore,		def st : SDNode<"ISD::STORE" , SDTStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def ist : SDNode<"ISD::STORE" , SDTIStore,		def ist : SDNode<"ISD::STORE" , SDTIStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
▲ Show 20 Lines • Show All 825 Lines • Show Last 20 Lines

lib/Analysis/InstructionSimplify.cpp

Show All 31 Lines
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/GetElementPtrTypeIterator.h"		#include "llvm/IR/GetElementPtrTypeIterator.h"
#include "llvm/IR/GlobalAlias.h"		#include "llvm/IR/GlobalAlias.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <algorithm>		#include <algorithm>
using namespace llvm;		using namespace llvm;
using namespace llvm::PatternMatch;		using namespace llvm::PatternMatch;

#define DEBUG_TYPE "instsimplify"		#define DEBUG_TYPE "instsimplify"

▲ Show 20 Lines • Show All 4,257 Lines • ▼ Show 20 Lines	if (FMF.noSignedZeros() && FMF.allowReassoc() &&
match(Op1, m_FSub(m_Value(X), m_Specific(Op0)))))		match(Op1, m_FSub(m_Value(X), m_Specific(Op0)))))
return X;		return X;

return nullptr;		return nullptr;
}		}

/// Given operands for an FSub, see if we can fold the result. If not, this		/// Given operands for an FSub, see if we can fold the result. If not, this
/// returns null.		/// returns null.
static Value SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,		template<typename MatchContext>
const SimplifyQuery &Q, unsigned MaxRecurse) {		static Value SimplifyFSubInstGeneric(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, unsigned MaxRecurse, MatchContext & MC) {

if (Constant *C = foldOrCommuteConstant(Instruction::FSub, Op0, Op1, Q))		if (Constant *C = foldOrCommuteConstant(Instruction::FSub, Op0, Op1, Q))
return C;		return C;

if (Constant *C = simplifyFPBinop(Op0, Op1))		if (Constant *C = simplifyFPBinop(Op0, Op1))
return C;		return C;

// fsub X, +0 ==> X		// fsub X, +0 ==> X
if (match(Op1, m_PosZeroFP()))		if (MC.try_match(Op1, m_PosZeroFP()))
return Op0;		return Op0;

// fsub X, -0 ==> X, when we know X is not -0		// fsub X, -0 ==> X, when we know X is not -0
if (match(Op1, m_NegZeroFP()) &&		if (MC.try_match(Op1, m_NegZeroFP()) &&
(FMF.noSignedZeros() \|\| CannotBeNegativeZero(Op0, Q.TLI)))		(FMF.noSignedZeros() \|\| CannotBeNegativeZero(Op0, Q.TLI)))
return Op0;		return Op0;

// fsub -0.0, (fsub -0.0, X) ==> X		// fsub -0.0, (fsub -0.0, X) ==> X
Value *X;		Value *X;
if (match(Op0, m_NegZeroFP()) &&		if (MC.try_match(Op0, m_NegZeroFP()) &&
match(Op1, m_FSub(m_NegZeroFP(), m_Value(X))))		MC.try_match(Op1, m_FSub(m_NegZeroFP(), m_Value(X))))
return X;		return X;

// fsub 0.0, (fsub 0.0, X) ==> X if signed zeros are ignored.		// fsub 0.0, (fsub 0.0, X) ==> X if signed zeros are ignored.
if (FMF.noSignedZeros() && match(Op0, m_AnyZeroFP()) &&		if (FMF.noSignedZeros() && match(Op0, m_AnyZeroFP()) &&
match(Op1, m_FSub(m_AnyZeroFP(), m_Value(X))))		MC.try_match(Op1, m_FSub(m_AnyZeroFP(), m_Value(X))))
return X;		return X;

// fsub nnan x, x ==> 0.0		// fsub nnan x, x ==> 0.0
if (FMF.noNaNs() && Op0 == Op1)		if (FMF.noNaNs() && Op0 == Op1)
return Constant::getNullValue(Op0->getType());		return Constant::getNullValue(Op0->getType());

// Y - (Y - X) --> X		// Y - (Y - X) --> X
// (X + Y) - Y --> X		// (X + Y) - Y --> X
if (FMF.noSignedZeros() && FMF.allowReassoc() &&		if (FMF.noSignedZeros() && FMF.allowReassoc() &&
(match(Op1, m_FSub(m_Specific(Op0), m_Value(X))) \|\|		(MC.try_match(Op1, m_FSub(m_Specific(Op0), m_Value(X))) \|\|
match(Op0, m_c_FAdd(m_Specific(Op1), m_Value(X)))))		MC.try_match(Op0, m_c_FAdd(m_Specific(Op1), m_Value(X)))))
return X;		return X;

return nullptr;		return nullptr;
}		}

		static Value SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, unsigned MaxRecurse) {
		EmptyContext EC;
		return SimplifyFSubInstGeneric<EmptyContext>(Op0, Op1, FMF, Q, MaxRecurse, EC);
		}


/// Given the operands for an FMul, see if we can fold the result		/// Given the operands for an FMul, see if we can fold the result
static Value SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,		static Value SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q, unsigned MaxRecurse) {		const SimplifyQuery &Q, unsigned MaxRecurse) {
if (Constant *C = foldOrCommuteConstant(Instruction::FMul, Op0, Op1, Q))		if (Constant *C = foldOrCommuteConstant(Instruction::FMul, Op0, Op1, Q))
return C;		return C;

if (Constant *C = simplifyFPBinop(Op0, Op1))		if (Constant *C = simplifyFPBinop(Op0, Op1))
return C;		return C;
Show All 24 Lines
}		}


Value llvm::SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q) {		const SimplifyQuery &Q) {
return ::SimplifyFSubInst(Op0, Op1, FMF, Q, RecursionLimit);		return ::SimplifyFSubInst(Op0, Op1, FMF, Q, RecursionLimit);
}		}

		Value llvm::SimplifyPredicatedFSubInst(Value Op0, Value *Op1, FastMathFlags FMF,
		const SimplifyQuery &Q, PredicatedContext & PC) {
		return ::SimplifyFSubInstGeneric<PredicatedContext>(Op0, Op1, FMF, Q, RecursionLimit, PC);
		}

Value llvm::SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,		Value llvm::SimplifyFMulInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q) {		const SimplifyQuery &Q) {
return ::SimplifyFMulInst(Op0, Op1, FMF, Q, RecursionLimit);		return ::SimplifyFMulInst(Op0, Op1, FMF, Q, RecursionLimit);
}		}

static Value SimplifyFDivInst(Value Op0, Value *Op1, FastMathFlags FMF,		static Value SimplifyFDivInst(Value Op0, Value *Op1, FastMathFlags FMF,
const SimplifyQuery &Q, unsigned) {		const SimplifyQuery &Q, unsigned) {
if (Constant *C = foldOrCommuteConstant(Instruction::FDiv, Op0, Op1, Q))		if (Constant *C = foldOrCommuteConstant(Instruction::FDiv, Op0, Op1, Q))
▲ Show 20 Lines • Show All 554 Lines • ▼ Show 20 Lines	Value llvm::SimplifyCall(CallBase Call, Value V, ArrayRef<Value > Args,
return ::SimplifyCall(Call, V, Args.begin(), Args.end(), Q, RecursionLimit);		return ::SimplifyCall(Call, V, Args.begin(), Args.end(), Q, RecursionLimit);
}		}

Value llvm::SimplifyCall(CallBase Call, const SimplifyQuery &Q) {		Value llvm::SimplifyCall(CallBase Call, const SimplifyQuery &Q) {
return ::SimplifyCall(Call, Call->getCalledValue(), Call->arg_begin(),		return ::SimplifyCall(Call, Call->getCalledValue(), Call->arg_begin(),
Call->arg_end(), Q, RecursionLimit);		Call->arg_end(), Q, RecursionLimit);
}		}

		Value *llvm::SimplifyVPIntrinsic(VPIntrinsic & VPInst, const SimplifyQuery &Q) {
		PredicatedContext PC(&VPInst);

		auto & PI = cast<PredicatedInstruction>(VPInst);
		switch (PI.getOpcode()) {
		default:
		return nullptr;

		case Instruction::FSub: return SimplifyPredicatedFSubInst(VPInst.getOperand(0), VPInst.getOperand(1), VPInst.getFastMathFlags(), Q, PC);
		}
		}

/// See if we can compute a simplified version of this instruction.		/// See if we can compute a simplified version of this instruction.
/// If not, this returns null.		/// If not, this returns null.

Value llvm::SimplifyInstruction(Instruction I, const SimplifyQuery &SQ,		Value llvm::SimplifyInstruction(Instruction I, const SimplifyQuery &SQ,
OptimizationRemarkEmitter *ORE) {		OptimizationRemarkEmitter *ORE) {
const SimplifyQuery Q = SQ.CxtI ? SQ : SQ.getWithInstruction(I);		const SimplifyQuery Q = SQ.CxtI ? SQ : SQ.getWithInstruction(I);
Value *Result;		Value *Result;

switch (I->getOpcode()) {		switch (I->getOpcode()) {
default:		default:
Result = ConstantFoldInstruction(I, Q.DL, Q.TLI);		Result = ConstantFoldInstruction(I, Q.DL, Q.TLI);
▲ Show 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	case Instruction::ShuffleVector: {
Result = SimplifyShuffleVectorInst(SVI->getOperand(0), SVI->getOperand(1),		Result = SimplifyShuffleVectorInst(SVI->getOperand(0), SVI->getOperand(1),
SVI->getMask(), SVI->getType(), Q);		SVI->getMask(), SVI->getType(), Q);
break;		break;
}		}
case Instruction::PHI:		case Instruction::PHI:
Result = SimplifyPHINode(cast<PHINode>(I), Q);		Result = SimplifyPHINode(cast<PHINode>(I), Q);
break;		break;
case Instruction::Call: {		case Instruction::Call: {
		auto * VPInst = dyn_cast<VPIntrinsic>(I);
		if (VPInst) {
		Result = SimplifyVPIntrinsic(*VPInst, Q);
		if (Result) break;
		}

		CallSite CS((I));
Result = SimplifyCall(cast<CallInst>(I), Q);		Result = SimplifyCall(cast<CallInst>(I), Q);
break;		break;
}		}
#define HANDLE_CAST_INST(num, opc, clas) case Instruction::opc:		#define HANDLE_CAST_INST(num, opc, clas) case Instruction::opc:
#include "llvm/IR/Instruction.def"		#include "llvm/IR/Instruction.def"
#undef HANDLE_CAST_INST		#undef HANDLE_CAST_INST
Result =		Result =
SimplifyCastInst(I->getOpcode(), I->getOperand(0), I->getType(), Q);		SimplifyCastInst(I->getOpcode(), I->getOperand(0), I->getType(), Q);
▲ Show 20 Lines • Show All 131 Lines • Show Last 20 Lines

lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	#define KEYWORD(STR) \
KEYWORD(convergent);		KEYWORD(convergent);
KEYWORD(dereferenceable);		KEYWORD(dereferenceable);
KEYWORD(dereferenceable_or_null);		KEYWORD(dereferenceable_or_null);
KEYWORD(inaccessiblememonly);		KEYWORD(inaccessiblememonly);
KEYWORD(inaccessiblemem_or_argmemonly);		KEYWORD(inaccessiblemem_or_argmemonly);
KEYWORD(inlinehint);		KEYWORD(inlinehint);
KEYWORD(inreg);		KEYWORD(inreg);
KEYWORD(jumptable);		KEYWORD(jumptable);
		KEYWORD(mask);
KEYWORD(minsize);		KEYWORD(minsize);
KEYWORD(naked);		KEYWORD(naked);
KEYWORD(nest);		KEYWORD(nest);
KEYWORD(noalias);		KEYWORD(noalias);
KEYWORD(nobuiltin);		KEYWORD(nobuiltin);
KEYWORD(nocapture);		KEYWORD(nocapture);
KEYWORD(noduplicate);		KEYWORD(noduplicate);
KEYWORD(noimplicitfloat);		KEYWORD(noimplicitfloat);
KEYWORD(noinline);		KEYWORD(noinline);
KEYWORD(norecurse);		KEYWORD(norecurse);
KEYWORD(nonlazybind);		KEYWORD(nonlazybind);
KEYWORD(nonnull);		KEYWORD(nonnull);
KEYWORD(noredzone);		KEYWORD(noredzone);
KEYWORD(noreturn);		KEYWORD(noreturn);
KEYWORD(nocf_check);		KEYWORD(nocf_check);
KEYWORD(nounwind);		KEYWORD(nounwind);
KEYWORD(optforfuzzing);		KEYWORD(optforfuzzing);
KEYWORD(optnone);		KEYWORD(optnone);
KEYWORD(optsize);		KEYWORD(optsize);
		KEYWORD(passthru);
KEYWORD(readnone);		KEYWORD(readnone);
KEYWORD(readonly);		KEYWORD(readonly);
KEYWORD(returned);		KEYWORD(returned);
KEYWORD(returns_twice);		KEYWORD(returns_twice);
KEYWORD(signext);		KEYWORD(signext);
KEYWORD(speculatable);		KEYWORD(speculatable);
KEYWORD(sret);		KEYWORD(sret);
KEYWORD(ssp);		KEYWORD(ssp);
KEYWORD(sspreq);		KEYWORD(sspreq);
KEYWORD(sspstrong);		KEYWORD(sspstrong);
KEYWORD(strictfp);		KEYWORD(strictfp);
KEYWORD(safestack);		KEYWORD(safestack);
KEYWORD(shadowcallstack);		KEYWORD(shadowcallstack);
KEYWORD(sanitize_address);		KEYWORD(sanitize_address);
KEYWORD(sanitize_hwaddress);		KEYWORD(sanitize_hwaddress);
KEYWORD(sanitize_thread);		KEYWORD(sanitize_thread);
KEYWORD(sanitize_memory);		KEYWORD(sanitize_memory);
KEYWORD(speculative_load_hardening);		KEYWORD(speculative_load_hardening);
KEYWORD(swifterror);		KEYWORD(swifterror);
KEYWORD(swiftself);		KEYWORD(swiftself);
KEYWORD(uwtable);		KEYWORD(uwtable);
		KEYWORD(vlen);
KEYWORD(writeonly);		KEYWORD(writeonly);
KEYWORD(zeroext);		KEYWORD(zeroext);
KEYWORD(immarg);		KEYWORD(immarg);

KEYWORD(type);		KEYWORD(type);
KEYWORD(opaque);		KEYWORD(opaque);

KEYWORD(comdat);		KEYWORD(comdat);
▲ Show 20 Lines • Show All 441 Lines • Show Last 20 Lines

lib/AsmParser/LLParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,296 Lines • ▼ Show 20 Lines	case lltok::kw_zeroext:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of attribute on a function");		"invalid use of attribute on a function");
break;		break;
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_dereferenceable:		case lltok::kw_dereferenceable:
case lltok::kw_dereferenceable_or_null:		case lltok::kw_dereferenceable_or_null:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_noalias:		case lltok::kw_noalias:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
case lltok::kw_nonnull:		case lltok::kw_nonnull:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
case lltok::kw_immarg:		case lltok::kw_immarg:
		case lltok::kw_vlen:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of parameter-only attribute on a function");		"invalid use of parameter-only attribute on a function");
break;		break;
}		}

Lex.Lex();		Lex.Lex();
}		}
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	case lltok::kw_dereferenceable_or_null: {
uint64_t Bytes;		uint64_t Bytes;
if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))		if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))
return true;		return true;
B.addDereferenceableOrNullAttr(Bytes);		B.addDereferenceableOrNullAttr(Bytes);
continue;		continue;
}		}
case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;		case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;
case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;		case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;
		case lltok::kw_mask: B.addAttribute(Attribute::Mask); break;
case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;		case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;		case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
		case lltok::kw_passthru: B.addAttribute(Attribute::Passthru); break;
case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;		case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;
case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;		case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;
case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;		case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;		case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;
case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;		case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;
case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;		case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;
		case lltok::kw_vlen: B.addAttribute(Attribute::VectorLength); break;
case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;		case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;
case lltok::kw_immarg: B.addAttribute(Attribute::ImmArg); break;		case lltok::kw_immarg: B.addAttribute(Attribute::ImmArg); break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
▲ Show 20 Lines • Show All 75 Lines • ▼ Show 20 Lines	while (true) {
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;

// Error handling.		// Error handling.
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
case lltok::kw_immarg:		case lltok::kw_immarg:
		case lltok::kw_vlen:
HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");		HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");
break;		break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
case lltok::kw_cold:		case lltok::kw_cold:
▲ Show 20 Lines • Show All 1,603 Lines • ▼ Show 20 Lines	if (Opc == Instruction::FCmp) {
if (!Val0->getType()->isIntOrIntVectorTy() &&		if (!Val0->getType()->isIntOrIntVectorTy() &&
!Val0->getType()->isPtrOrPtrVectorTy())		!Val0->getType()->isPtrOrPtrVectorTy())
return Error(ID.Loc, "icmp requires pointer or integer operands");		return Error(ID.Loc, "icmp requires pointer or integer operands");
ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);		ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);
}		}
ID.Kind = ValID::t_Constant;		ID.Kind = ValID::t_Constant;
return false;		return false;
}		}

// Unary Operators.		// Unary Operators.
case lltok::kw_fneg: {		case lltok::kw_fneg: {
unsigned Opc = Lex.getUIntVal();		unsigned Opc = Lex.getUIntVal();
Constant *Val;		Constant *Val;
Lex.Lex();		Lex.Lex();
if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|		if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|
ParseGlobalTypeAndValue(Val) \|\|		ParseGlobalTypeAndValue(Val) \|\|
ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))		ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))
return true;		return true;

// Check that the type is valid for the operator.		// Check that the type is valid for the operator.
switch (Opc) {		switch (Opc) {
case Instruction::FNeg:		case Instruction::FNeg:
if (!Val->getType()->isFPOrFPVectorTy())		if (!Val->getType()->isFPOrFPVectorTy())
return Error(ID.Loc, "constexpr requires fp operands");		return Error(ID.Loc, "constexpr requires fp operands");
break;		break;
default: llvm_unreachable("Unknown unary operator!");		default: llvm_unreachable("Unknown unary operator!");
}		}
▲ Show 20 Lines • Show All 2,879 Lines • ▼ Show 20 Lines	bool LLParser::ParseUnaryOp(Instruction *&Inst, PerFunctionState &PFS,

bool Valid;		bool Valid;
switch (OperandType) {		switch (OperandType) {
default: llvm_unreachable("Unknown operand type!");		default: llvm_unreachable("Unknown operand type!");
case 0: // int or FP.		case 0: // int or FP.
Valid = LHS->getType()->isIntOrIntVectorTy() \|\|		Valid = LHS->getType()->isIntOrIntVectorTy() \|\|
LHS->getType()->isFPOrFPVectorTy();		LHS->getType()->isFPOrFPVectorTy();
break;		break;
case 1:		case 1:
Valid = LHS->getType()->isIntOrIntVectorTy();		Valid = LHS->getType()->isIntOrIntVectorTy();
break;		break;
case 2:		case 2:
Valid = LHS->getType()->isFPOrFPVectorTy();		Valid = LHS->getType()->isFPOrFPVectorTy();
break;		break;
}		}

if (!Valid)		if (!Valid)
return Error(Loc, "invalid operand type for instruction");		return Error(Loc, "invalid operand type for instruction");

Inst = UnaryOperator::Create((Instruction::UnaryOps)Opc, LHS);		Inst = UnaryOperator::Create((Instruction::UnaryOps)Opc, LHS);
return false;		return false;
▲ Show 20 Lines • Show All 2,346 Lines • Show Last 20 Lines

lib/AsmParser/LLToken.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	enum Kind {
kw_convergent,		kw_convergent,
kw_dereferenceable,		kw_dereferenceable,
kw_dereferenceable_or_null,		kw_dereferenceable_or_null,
kw_inaccessiblememonly,		kw_inaccessiblememonly,
kw_inaccessiblemem_or_argmemonly,		kw_inaccessiblemem_or_argmemonly,
kw_inlinehint,		kw_inlinehint,
kw_inreg,		kw_inreg,
kw_jumptable,		kw_jumptable,
		kw_mask,
kw_minsize,		kw_minsize,
kw_naked,		kw_naked,
kw_nest,		kw_nest,
kw_noalias,		kw_noalias,
kw_nobuiltin,		kw_nobuiltin,
kw_nocapture,		kw_nocapture,
kw_noduplicate,		kw_noduplicate,
kw_noimplicitfloat,		kw_noimplicitfloat,
kw_noinline,		kw_noinline,
kw_norecurse,		kw_norecurse,
kw_nonlazybind,		kw_nonlazybind,
kw_nonnull,		kw_nonnull,
kw_noredzone,		kw_noredzone,
kw_noreturn,		kw_noreturn,
kw_nocf_check,		kw_nocf_check,
kw_nounwind,		kw_nounwind,
kw_optforfuzzing,		kw_optforfuzzing,
kw_optnone,		kw_optnone,
kw_optsize,		kw_optsize,
		kw_passthru,
kw_readnone,		kw_readnone,
kw_readonly,		kw_readonly,
kw_returned,		kw_returned,
kw_returns_twice,		kw_returns_twice,
kw_signext,		kw_signext,
kw_speculatable,		kw_speculatable,
kw_ssp,		kw_ssp,
kw_sspreq,		kw_sspreq,
kw_sspstrong,		kw_sspstrong,
kw_safestack,		kw_safestack,
kw_shadowcallstack,		kw_shadowcallstack,
kw_sret,		kw_sret,
kw_sanitize_thread,		kw_sanitize_thread,
kw_sanitize_memory,		kw_sanitize_memory,
kw_speculative_load_hardening,		kw_speculative_load_hardening,
kw_strictfp,		kw_strictfp,
kw_swifterror,		kw_swifterror,
kw_swiftself,		kw_swiftself,
kw_uwtable,		kw_uwtable,
		kw_vlen,
kw_writeonly,		kw_writeonly,
kw_zeroext,		kw_zeroext,
kw_immarg,		kw_immarg,

kw_type,		kw_type,
kw_opaque,		kw_opaque,

kw_comdat,		kw_comdat,
▲ Show 20 Lines • Show All 226 Lines • Show Last 20 Lines

lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 1,328 Lines • ▼ Show 20 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:		case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:
return Attribute::InaccessibleMemOrArgMemOnly;		return Attribute::InaccessibleMemOrArgMemOnly;
case bitc::ATTR_KIND_INLINE_HINT:		case bitc::ATTR_KIND_INLINE_HINT:
return Attribute::InlineHint;		return Attribute::InlineHint;
case bitc::ATTR_KIND_IN_REG:		case bitc::ATTR_KIND_IN_REG:
return Attribute::InReg;		return Attribute::InReg;
case bitc::ATTR_KIND_JUMP_TABLE:		case bitc::ATTR_KIND_JUMP_TABLE:
return Attribute::JumpTable;		return Attribute::JumpTable;
		case bitc::ATTR_KIND_MASK:
		return Attribute::Mask;
case bitc::ATTR_KIND_MIN_SIZE:		case bitc::ATTR_KIND_MIN_SIZE:
return Attribute::MinSize;		return Attribute::MinSize;
case bitc::ATTR_KIND_NAKED:		case bitc::ATTR_KIND_NAKED:
return Attribute::Naked;		return Attribute::Naked;
case bitc::ATTR_KIND_NEST:		case bitc::ATTR_KIND_NEST:
return Attribute::Nest;		return Attribute::Nest;
case bitc::ATTR_KIND_NO_ALIAS:		case bitc::ATTR_KIND_NO_ALIAS:
return Attribute::NoAlias;		return Attribute::NoAlias;
Show All 28 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_NO_UNWIND:		case bitc::ATTR_KIND_NO_UNWIND:
return Attribute::NoUnwind;		return Attribute::NoUnwind;
case bitc::ATTR_KIND_OPT_FOR_FUZZING:		case bitc::ATTR_KIND_OPT_FOR_FUZZING:
return Attribute::OptForFuzzing;		return Attribute::OptForFuzzing;
case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:		case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:
return Attribute::OptimizeForSize;		return Attribute::OptimizeForSize;
case bitc::ATTR_KIND_OPTIMIZE_NONE:		case bitc::ATTR_KIND_OPTIMIZE_NONE:
return Attribute::OptimizeNone;		return Attribute::OptimizeNone;
		case bitc::ATTR_KIND_PASSTHRU:
		return Attribute::Passthru;
case bitc::ATTR_KIND_READ_NONE:		case bitc::ATTR_KIND_READ_NONE:
return Attribute::ReadNone;		return Attribute::ReadNone;
case bitc::ATTR_KIND_READ_ONLY:		case bitc::ATTR_KIND_READ_ONLY:
return Attribute::ReadOnly;		return Attribute::ReadOnly;
case bitc::ATTR_KIND_RETURNED:		case bitc::ATTR_KIND_RETURNED:
return Attribute::Returned;		return Attribute::Returned;
case bitc::ATTR_KIND_RETURNS_TWICE:		case bitc::ATTR_KIND_RETURNS_TWICE:
return Attribute::ReturnsTwice;		return Attribute::ReturnsTwice;
Show All 28 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING:		case bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING:
return Attribute::SpeculativeLoadHardening;		return Attribute::SpeculativeLoadHardening;
case bitc::ATTR_KIND_SWIFT_ERROR:		case bitc::ATTR_KIND_SWIFT_ERROR:
return Attribute::SwiftError;		return Attribute::SwiftError;
case bitc::ATTR_KIND_SWIFT_SELF:		case bitc::ATTR_KIND_SWIFT_SELF:
return Attribute::SwiftSelf;		return Attribute::SwiftSelf;
case bitc::ATTR_KIND_UW_TABLE:		case bitc::ATTR_KIND_UW_TABLE:
return Attribute::UWTable;		return Attribute::UWTable;
		case bitc::ATTR_KIND_VECTORLENGTH:
		return Attribute::VectorLength;
case bitc::ATTR_KIND_WRITEONLY:		case bitc::ATTR_KIND_WRITEONLY:
return Attribute::WriteOnly;		return Attribute::WriteOnly;
case bitc::ATTR_KIND_Z_EXT:		case bitc::ATTR_KIND_Z_EXT:
return Attribute::ZExt;		return Attribute::ZExt;
case bitc::ATTR_KIND_IMMARG:		case bitc::ATTR_KIND_IMMARG:
return Attribute::ImmArg;		return Attribute::ImmArg;
}		}
}		}
▲ Show 20 Lines • Show All 4,767 Lines • Show Last 20 Lines

lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 666 Lines • ▼ Show 20 Lines	static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
case Attribute::OptimizeNone:		case Attribute::OptimizeNone:
return bitc::ATTR_KIND_OPTIMIZE_NONE;		return bitc::ATTR_KIND_OPTIMIZE_NONE;
case Attribute::ReadNone:		case Attribute::ReadNone:
return bitc::ATTR_KIND_READ_NONE;		return bitc::ATTR_KIND_READ_NONE;
case Attribute::ReadOnly:		case Attribute::ReadOnly:
return bitc::ATTR_KIND_READ_ONLY;		return bitc::ATTR_KIND_READ_ONLY;
case Attribute::Returned:		case Attribute::Returned:
return bitc::ATTR_KIND_RETURNED;		return bitc::ATTR_KIND_RETURNED;
		case Attribute::Mask:
		return bitc::ATTR_KIND_MASK;
		case Attribute::VectorLength:
		return bitc::ATTR_KIND_VECTORLENGTH;
		case Attribute::Passthru:
		return bitc::ATTR_KIND_PASSTHRU;
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
return bitc::ATTR_KIND_RETURNS_TWICE;		return bitc::ATTR_KIND_RETURNS_TWICE;
case Attribute::SExt:		case Attribute::SExt:
return bitc::ATTR_KIND_S_EXT;		return bitc::ATTR_KIND_S_EXT;
case Attribute::Speculatable:		case Attribute::Speculatable:
return bitc::ATTR_KIND_SPECULATABLE;		return bitc::ATTR_KIND_SPECULATABLE;
case Attribute::StackAlignment:		case Attribute::StackAlignment:
return bitc::ATTR_KIND_STACK_ALIGNMENT;		return bitc::ATTR_KIND_STACK_ALIGNMENT;
▲ Show 20 Lines • Show All 3,870 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 389 Lines • ▼ Show 20 Lines	private:
SDValue visitAssertExt(SDNode *N);		SDValue visitAssertExt(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
		SDValue visitFADD_VP(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
SDValue visitFREM(SDNode *N);		SDValue visitFREM(SDNode *N);
SDValue visitFSQRT(SDNode *N);		SDValue visitFSQRT(SDNode *N);
SDValue visitFCOPYSIGN(SDNode *N);		SDValue visitFCOPYSIGN(SDNode *N);
SDValue visitFPOW(SDNode *N);		SDValue visitFPOW(SDNode *N);
Show All 33 Lines	private:
SDValue visitMLOAD(SDNode *N);		SDValue visitMLOAD(SDNode *N);
SDValue visitMSTORE(SDNode *N);		SDValue visitMSTORE(SDNode *N);
SDValue visitMGATHER(SDNode *N);		SDValue visitMGATHER(SDNode *N);
SDValue visitMSCATTER(SDNode *N);		SDValue visitMSCATTER(SDNode *N);
SDValue visitFP_TO_FP16(SDNode *N);		SDValue visitFP_TO_FP16(SDNode *N);
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);
SDValue visitVECREDUCE(SDNode *N);		SDValue visitVECREDUCE(SDNode *N);

		template<class MatchContextClass>
SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1, SDNodeFlags Flags);		SDValue N1, SDNodeFlags Flags);

▲ Show 20 Lines • Show All 238 Lines • ▼ Show 20 Lines	public:
explicit WorklistInserter(DAGCombiner &dc)		explicit WorklistInserter(DAGCombiner &dc)
: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}		: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}

// FIXME: Ideally we could add N to the worklist, but this causes exponential		// FIXME: Ideally we could add N to the worklist, but this causes exponential
// compile time costs in large DAGs, e.g. Halide.		// compile time costs in large DAGs, e.g. Halide.
void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }		void NodeInserted(SDNode *N) override { DC.ConsiderForPruning(N); }
};		};

		struct EmptyMatchContext {
		SelectionDAG & DAG;

		EmptyMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		{}

		bool match(SDValue OpN, unsigned OpCode) const { return OpCode == OpN->getOpcode(); }

		unsigned getFunctionOpCode(SDValue N) const {
		return N->getOpcode();
		}

		bool isCompatible(SDValue OpVal) const { return true; }

		// Specialize based on number of operands.
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, Operand, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4, SDValue N5) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4, N5);
		}
		};

		struct
		VPMatchContext {
		SelectionDAG & DAG;
		SDNode * Root;
		SDValue RootMaskOp;
		SDValue RootVectorLenOp;

		VPMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		, Root(Root)
		, RootMaskOp()
		, RootVectorLenOp()
		{
		if (Root->isVP()) {
		int RootMaskPos = ISD::GetMaskPosVP(Root->getOpcode());
		if (RootMaskPos != -1) {
		RootMaskOp = Root->getOperand(RootMaskPos);
		}

		int RootVLenPos = ISD::GetVectorLengthPosVP(Root->getOpcode());
		if (RootVLenPos != -1) {
		RootVectorLenOp = Root->getOperand(RootVLenPos);
		}
		}
		}

		unsigned getFunctionOpCode(SDValue N) const {
		unsigned VPOpCode = N->getOpcode();
		return ISD::GetFunctionOpCodeForVP(VPOpCode);
		}

		bool isCompatible(SDValue OpVal) const {
		if (!OpVal->isVP()) {
		return !Root->isVP();

		} else {
		unsigned VPOpCode = OpVal->getOpcode();
		int MaskPos = ISD::GetMaskPosVP(VPOpCode);
		if (MaskPos != -1 && RootMaskOp != OpVal.getOperand(MaskPos)) {
		return false;
		}

		int VLenPos = ISD::GetVectorLengthPosVP(VPOpCode);
		if (VLenPos != -1 && RootVectorLenOp != OpVal.getOperand(VLenPos)) {
		return false;
		}

		return true;
		}
		}

		/// whether \p OpN is a node that is functionally compatible with the NodeType \p OpNodeTy
		bool match(SDValue OpVal, unsigned OpNT) const {
		return isCompatible(OpVal) && getFunctionOpCode(OpVal) == OpNT;
		}

		// Specialize based on number of operands.
		// TODO emit VP intrinsics where MaskOp/VectorLenOp != null
		// SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 1 && VLenPos == 2);

		return DAG.getNode(VPOpcode, DL, VT, {Operand, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 2 && VLenPos == 3);

		return DAG.getNode(VPOpcode, DL, VT, {N1, N2, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned VPOpcode = ISD::GetVPForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosVP(VPOpcode);
		int VLenPos = ISD::GetVectorLengthPosVP(VPOpcode);
		assert(MaskPos == 3 && VLenPos == 4);

		return DAG.getNode(VPOpcode, DL, VT, {N1, N2, N3, RootMaskOp, RootVectorLenOp}, Flags);
		}
		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering::DAGCombinerInfo implementation		// TargetLowering::DAGCombinerInfo implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {		void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {
((DAGCombiner*)DC)->AddToWorklist(N);		((DAGCombiner*)DC)->AddToWorklist(N);
▲ Show 20 Lines • Show All 897 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::AssertZext: return visitAssertExt(N);		case ISD::AssertZext: return visitAssertExt(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);		case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
		case ISD::VP_FADD: return visitFADD_VP(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
case ISD::FREM: return visitFREM(N);		case ISD::FREM: return visitFREM(N);
case ISD::FSQRT: return visitFSQRT(N);		case ISD::FSQRT: return visitFSQRT(N);
case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);		case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);
case ISD::FPOW: return visitFPOW(N);		case ISD::FPOW: return visitFPOW(N);
▲ Show 20 Lines • Show All 9,133 Lines • ▼ Show 20 Lines	ConstantFoldBITCASTofBUILD_VECTOR(SDNode *BV, EVT DstEltVT) {
return DAG.getBuildVector(VT, DL, Ops);		return DAG.getBuildVector(VT, DL, Ops);
}		}

static bool isContractable(SDNode *N) {		static bool isContractable(SDNode *N) {
SDNodeFlags F = N->getFlags();		SDNodeFlags F = N->getFlags();
return F.hasAllowContract() \|\| F.hasAllowReassociation();		return F.hasAllowContract() \|\| F.hasAllowReassociation();
}		}


/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
		template<class MatchContextClass>
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);

		MatchContextClass matcher(DAG, N);
		if (!matcher.isCompatible(N0) \|\| !matcher.isCompatible(N1)) return SDValue();

const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));		bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		TLI.isFMAFasterThanFMulAndFAdd(VT) &&
Show All 16 Lines	if (STI && STI->generateFMAsInMachineCombiner(OptLevel))
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);

// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (!matcher.match(N, ISD::FMUL))
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};
// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
if (N0.getNode()->use_size() > N1.getNode()->use_size())		if (N0.getNode()->use_size() > N1.getNode()->use_size())
std::swap(N0, N1);		std::swap(N0, N1);
}		}

// fold (fadd (fmul x, y), z) -> (fma x, y, z)		// fold (fadd (fmul x, y), z) -> (fma x, y, z)
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1), N1, Flags);		N0.getOperand(0), N0.getOperand(1), N1, Flags);
}		}

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1), N0, Flags);		N1.getOperand(0), N1.getOperand(1), N0, Flags);
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if ((N0.getOpcode() == ISD::FP_EXTEND) && matcher.isCompatible(N0.getOperand(0))) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(0)),		N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(1)), N1, Flags);		N00.getOperand(1)), N1, Flags);
}		}
}		}

// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)		// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (N1.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N1, ISD::FP_EXTEND)) {
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
if (isContractableFMUL(N10) &&		if (isContractableFMUL(N10) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N10.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N10.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(0)),		N10.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(1)), N0, Flags);		N10.getOperand(1)), N0, Flags);
}		}
}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if (Aggressive) {		if (Aggressive) {
// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))		// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
if (CanFuse &&		if (CanFuse &&
N0.getOpcode() == PreferredFusedOpcode &&		matcher.match(N0, PreferredFusedOpcode) &&
N0.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N0.getOperand(2), ISD::FMUL) &&
N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {		N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
N0.getOperand(2).getOperand(1),		N0.getOperand(2).getOperand(1),
N1, Flags), Flags);		N1, Flags), Flags);
}		}

// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))		// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))
if (CanFuse &&		if (CanFuse &&
N1->getOpcode() == PreferredFusedOpcode &&		matcher.match(N1, PreferredFusedOpcode) &&
N1.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N1.getOperand(2), ISD::FMUL) &&
N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {		N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1),		N1.getOperand(0), N1.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(2).getOperand(0),		N1.getOperand(2).getOperand(0),
N1.getOperand(2).getOperand(1),		N1.getOperand(2).getOperand(1),
N0, Flags), Flags);		N0, Flags), Flags);
}		}


// fold (fadd (fma x, y, (fpext (fmul u, v))), z)		// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y, (fma (fpext u), (fpext v), z))		// -> (fma x, y, (fma (fpext u), (fpext v), z))
auto FoldFAddFMAFPExtFMul = [&] (		auto FoldFAddFMAFPExtFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, X, Y,		return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y,
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == PreferredFusedOpcode) {		if (matcher.match(N0, PreferredFusedOpcode)) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (N02.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N02, ISD::FP_EXTEND)) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (isContractableFMUL(N020) &&		if (isContractableFMUL(N020) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N020.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N020.getValueType())) {
return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),		return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),
N020.getOperand(0), N020.getOperand(1),		N020.getOperand(0), N020.getOperand(1),
N1, Flags);		N1, Flags);
}		}
}		}
}		}

// fold (fadd (fpext (fma x, y, (fmul u, v))), z)		// fold (fadd (fpext (fma x, y, (fmul u, v))), z)
// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))		// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
auto FoldFAddFPExtFMAFMul = [&] (		auto FoldFAddFPExtFMAFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, X),		matcher.getNode(ISD::FP_EXTEND, SL, VT, X),
DAG.getNode(ISD::FP_EXTEND, SL, VT, Y),		matcher.getNode(ISD::FP_EXTEND, SL, VT, Y),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (N00.getOpcode() == PreferredFusedOpcode) {		if (N00.getOpcode() == PreferredFusedOpcode) {
SDValue N002 = N00.getOperand(2);		SDValue N002 = N00.getOperand(2);
if (isContractableFMUL(N002) &&		if (isContractableFMUL(N002) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {
if (SDValue FMA = FuseFSUB(N0, N1, Flags))		if (SDValue FMA = FuseFSUB(N0, N1, Flags))
return FMA;		return FMA;
if (SDValue FMA = FuseFSUB(N1, N0, Flags))		if (SDValue FMA = FuseFSUB(N1, N0, Flags))
return FMA;		return FMA;

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitFADD_VP(SDNode *N) {
		// FADD -> FMA combines:
		if (SDValue Fused = visitFADDForFMACombine<VPMatchContext>(N)) {
		AddToWorklist(Fused.getNode());
		return Fused;
		}
		return SDValue();
		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);		bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);
bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);		bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegalOrCustom(ISD::FMUL, VT) && !N0CFP && !N1CFP) {
N0.getOperand(0) == N1.getOperand(0)) {		N0.getOperand(0) == N1.getOperand(0)) {
return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),
DAG.getConstantFP(4.0, DL, VT), Flags);		DAG.getConstantFP(4.0, DL, VT), Flags);
}		}
}		}
} // enable-unsafe-fp-math		} // enable-unsafe-fp-math

// FADD -> FMA combines:		// FADD -> FMA combines:
if (SDValue Fused = visitFADDForFMACombine(N)) {		if (SDValue Fused = visitFADDForFMACombine<EmptyMatchContext>(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFSUB(SDNode *N) {		SDValue DAGCombiner::visitFSUB(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 8,507 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 903 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_UADDSUBO(SDNode *N, unsigned ResNo) {

// Use the calculated overflow everywhere.		// Use the calculated overflow everywhere.
ReplaceValueWith(SDValue(N, 1), Ofl);		ReplaceValueWith(SDValue(N, 1), Ofl);

return Res;		return Res;
}		}

// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that		// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that
// the third operand of ADDE/SUBE nodes is carry flag, which differs from		// the third operand of ADDE/SUBE nodes is carry flag, which differs from
// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.		// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.
SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {
if (ResNo == 1)		if (ResNo == 1)
return PromoteIntRes_Overflow(N);		return PromoteIntRes_Overflow(N);

// We need to sign-extend the operands so the carry value computed by the		// We need to sign-extend the operands so the carry value computed by the
// wide operation will be equivalent to the carry value computed by the		// wide operation will be equivalent to the carry value computed by the
// narrow operation.		// narrow operation.
▲ Show 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Promote integer operand: "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");
SDValue Res = SDValue();		SDValue Res = SDValue();

if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {		if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {
LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");		LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");
return false;		return false;
}		}

		if (N->isVP()) {
		Res = PromoteIntOp_VP(N, OpNo);
		} else {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
#ifndef NDEBUG		#ifndef NDEBUG
dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";		dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";
N->dump(&DAG); dbgs() << "\n";		N->dump(&DAG); dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to promote this operator's operand!");		llvm_unreachable("Do not know how to promote this operator's operand!");

▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {
case ISD::VECREDUCE_AND:		case ISD::VECREDUCE_AND:
case ISD::VECREDUCE_OR:		case ISD::VECREDUCE_OR:
case ISD::VECREDUCE_XOR:		case ISD::VECREDUCE_XOR:
case ISD::VECREDUCE_SMAX:		case ISD::VECREDUCE_SMAX:
case ISD::VECREDUCE_SMIN:		case ISD::VECREDUCE_SMIN:
case ISD::VECREDUCE_UMAX:		case ISD::VECREDUCE_UMAX:
case ISD::VECREDUCE_UMIN: Res = PromoteIntOp_VECREDUCE(N); break;		case ISD::VECREDUCE_UMIN: Res = PromoteIntOp_VECREDUCE(N); break;
}		}
		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
if (Res.getNode() == N)		if (Res.getNode() == N)
return true;		return true;
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	if (OpNo == 3) {
TruncateStore = true;		TruncateStore = true;
}		}

return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,		return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,
N->getMemoryVT(), N->getMemOperand(),		N->getMemoryVT(), N->getMemOperand(),
TruncateStore, N->isCompressingStore());		TruncateStore, N->isCompressingStore());
}		}

		SDValue DAGTypeLegalizer::PromoteIntOp_VP(SDNode *N, unsigned OpNo) {
		EVT DataVT;
		switch (N->getOpcode()) {
		default:
		DataVT = N->getValueType(0);
		break;

		case ISD::VP_STORE:
		case ISD::VP_SCATTER:
		llvm_unreachable("TODO implement VP memory nodes");
		}

		// TODO assert that \p OpNo is the mask
		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
		NewOps[OpNo] = Mask;
		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
		}

SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,		SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,
unsigned OpNo) {		unsigned OpNo) {
assert(OpNo == 2 && "Only know how to promote the mask!");		assert(OpNo == 2 && "Only know how to promote the mask!");
EVT DataVT = N->getValueType(0);		EVT DataVT = N->getValueType(0);
SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
NewOps[OpNo] = Mask;		NewOps[OpNo] = Mask;
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
▲ Show 20 Lines • Show All 2,581 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 343 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_VAARG(SDNode *N);		SDValue PromoteIntRes_VAARG(SDNode *N);
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);		SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
SDValue PromoteIntRes_MULFIX(SDNode *N);		SDValue PromoteIntRes_MULFIX(SDNode *N);
SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);		SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);
SDValue PromoteIntRes_VECREDUCE(SDNode *N);		SDValue PromoteIntRes_VECREDUCE(SDNode *N);
SDValue PromoteIntRes_ABS(SDNode *N);		SDValue PromoteIntRes_ABS(SDNode *N);


// Integer Operand Promotion.		// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);		bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);		SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);
SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);		SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);
SDValue PromoteIntOp_BITCAST(SDNode *N);		SDValue PromoteIntOp_BITCAST(SDNode *N);
SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);		SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);
SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);
Show All 18 Lines	private:
SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);		SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MULFIX(SDNode *N);		SDValue PromoteIntOp_MULFIX(SDNode *N);
SDValue PromoteIntOp_FPOWI(SDNode *N);		SDValue PromoteIntOp_FPOWI(SDNode *N);
SDValue PromoteIntOp_VECREDUCE(SDNode *N);		SDValue PromoteIntOp_VECREDUCE(SDNode *N);
		SDValue PromoteIntOp_VP(SDNode *N, unsigned OpNo);

void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);		void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Integer Expansion Support: LegalizeIntegerTypes.cpp		// Integer Expansion Support: LegalizeIntegerTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Given a processed operand Op which was expanded into two integers of half		/// Given a processed operand Op which was expanded into two integers of half
▲ Show 20 Lines • Show All 567 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 428 Lines • ▼ Show 20 Lines	if (IsInteger) {
case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE		case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE
}		}
}		}

return Result;		return Result;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// SDNode VP Support
		//===----------------------------------------------------------------------===//

		int
		ISD::GetMaskPosVP(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case ISD::VP_FNEG:
		return 1;

		case ISD::VP_ADD:
		case ISD::VP_SUB:
		case ISD::VP_MUL:
		case ISD::VP_SDIV:
		case ISD::VP_SREM:
		case ISD::VP_UDIV:
		case ISD::VP_UREM:

		case ISD::VP_AND:
		case ISD::VP_OR:
		case ISD::VP_XOR:
		case ISD::VP_SHL:
		case ISD::VP_SRA:
		case ISD::VP_SRL:
		case ISD::VP_FDIV:
		case ISD::VP_FREM:

		case ISD::VP_FADD:
		case ISD::VP_FMUL:
		return 2;

		case ISD::VP_FMA:
		case ISD::VP_SELECT:
		return 3;

		case VP_REDUCE_FADD:
		case VP_REDUCE_FMUL:
		case VP_REDUCE_ADD:
		case VP_REDUCE_MUL:
		case VP_REDUCE_AND:
		case VP_REDUCE_OR:
		case VP_REDUCE_XOR:
		case VP_REDUCE_SMAX:
		case VP_REDUCE_SMIN:
		case VP_REDUCE_UMAX:
		case VP_REDUCE_UMIN:
		case VP_REDUCE_FMAX:
		case VP_REDUCE_FMIN:
		return 1;

		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		// (implicit) case ISD::VP_COMPOSE: return -1
		}
		}

		int
		ISD::GetVectorLengthPosVP(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case VP_SELECT:
		return 0;

		case VP_FNEG:
		return 2;

		case VP_ADD:
		case VP_SUB:
		case VP_MUL:
		case VP_SDIV:
		case VP_SREM:
		case VP_UDIV:
		case VP_UREM:

		case VP_AND:
		case VP_OR:
		case VP_XOR:
		case VP_SHL:
		case VP_SRA:
		case VP_SRL:

		case VP_FADD:
		case VP_FMUL:
		case VP_FDIV:
		case VP_FREM:
		return 3;

		case VP_FMA:
		return 4;

		case VP_COMPOSE:
		return 3;

		case VP_REDUCE_FADD:
		case VP_REDUCE_FMUL:
		case VP_REDUCE_ADD:
		case VP_REDUCE_MUL:
		case VP_REDUCE_AND:
		case VP_REDUCE_OR:
		case VP_REDUCE_XOR:
		case VP_REDUCE_SMAX:
		case VP_REDUCE_SMIN:
		case VP_REDUCE_UMAX:
		case VP_REDUCE_UMIN:
		case VP_REDUCE_FMAX:
		case VP_REDUCE_FMIN:
		return 2;
		}
		}

		unsigned
		ISD::GetFunctionOpCodeForVP(unsigned OpCode) {
		switch (OpCode) {
		default: return OpCode;

		case VP_SELECT: return ISD::VSELECT;
		case VP_FNEG: return ISD::FNEG;
		case VP_ADD: return ISD::ADD;
		case VP_SUB: return ISD::SUB;
		case VP_MUL: return ISD::MUL;
		case VP_SDIV: return ISD::SDIV;
		case VP_SREM: return ISD::SREM;
		case VP_UDIV: return ISD::UDIV;
		case VP_UREM: return ISD::UREM;

		case VP_AND: return ISD::AND;
		case VP_OR: return ISD::OR;
		case VP_XOR: return ISD::XOR;
		case VP_SHL: return ISD::SHL;
		case VP_SRA: return ISD::SRA;
		case VP_SRL: return ISD::SRL;
		case VP_FDIV: return ISD::FDIV;
		case VP_FREM: return ISD::FREM;

		case VP_FADD: return ISD::FADD;
		case VP_FMUL: return ISD::FMUL;

		case VP_FMA: return ISD::FMA;
		}
		}

		unsigned
		ISD::GetVPForFunctionOpCode(unsigned OpCode) {
		switch (OpCode) {
		default: llvm_unreachable("can not translate this Opcode to VP");

		case VSELECT:return ISD::VP_SELECT;
		case FNEG: return ISD::VP_FNEG;
		case ADD: return ISD::VP_ADD;
		case SUB: return ISD::VP_SUB;
		case MUL: return ISD::VP_MUL;
		case SDIV: return ISD::VP_SDIV;
		case SREM: return ISD::VP_SREM;
		case UDIV: return ISD::VP_UDIV;
		case UREM: return ISD::VP_UREM;

		case AND: return ISD::VP_AND;
		case OR: return ISD::VP_OR;
		case XOR: return ISD::VP_XOR;
		case SHL: return ISD::VP_SHL;
		case SRA: return ISD::VP_SRA;
		case SRL: return ISD::VP_SRL;
		case FDIV: return ISD::VP_FDIV;
		case FREM: return ISD::VP_FREM;

		case FADD: return ISD::VP_FADD;
		case FMUL: return ISD::VP_FMUL;

		case FMA: return ISD::VP_FMA;
		}
		}


		//===----------------------------------------------------------------------===//
// SDNode Profile Support		// SDNode Profile Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// AddNodeIDOpcode - Add the node opcode to the NodeID data.		/// AddNodeIDOpcode - Add the node opcode to the NodeID data.
static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {		static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {
ID.AddInteger(OpC);		ID.AddInteger(OpC);
}		}

▲ Show 20 Lines • Show All 113 Lines • ▼ Show 20 Lines	static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
}		}
case ISD::STORE: {		case ISD::STORE: {
const StoreSDNode *ST = cast<StoreSDNode>(N);		const StoreSDNode *ST = cast<StoreSDNode>(N);
ID.AddInteger(ST->getMemoryVT().getRawBits());		ID.AddInteger(ST->getMemoryVT().getRawBits());
ID.AddInteger(ST->getRawSubclassData());		ID.AddInteger(ST->getRawSubclassData());
ID.AddInteger(ST->getPointerInfo().getAddrSpace());		ID.AddInteger(ST->getPointerInfo().getAddrSpace());
break;		break;
}		}
		case ISD::VP_LOAD: {
		const VPLoadSDNode *ELD = cast<VPLoadSDNode>(N);
		ID.AddInteger(ELD->getMemoryVT().getRawBits());
		ID.AddInteger(ELD->getRawSubclassData());
		ID.AddInteger(ELD->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_STORE: {
		const VPStoreSDNode *EST = cast<VPStoreSDNode>(N);
		ID.AddInteger(EST->getMemoryVT().getRawBits());
		ID.AddInteger(EST->getRawSubclassData());
		ID.AddInteger(EST->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_GATHER: {
		const VPGatherSDNode *EG = cast<VPGatherSDNode>(N);
		ID.AddInteger(EG->getMemoryVT().getRawBits());
		ID.AddInteger(EG->getRawSubclassData());
		ID.AddInteger(EG->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::VP_SCATTER: {
		const VPScatterSDNode *ES = cast<VPScatterSDNode>(N);
		ID.AddInteger(ES->getMemoryVT().getRawBits());
		ID.AddInteger(ES->getRawSubclassData());
		ID.AddInteger(ES->getPointerInfo().getAddrSpace());
		break;
		}
case ISD::MLOAD: {		case ISD::MLOAD: {
const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);		const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);
ID.AddInteger(MLD->getMemoryVT().getRawBits());		ID.AddInteger(MLD->getMemoryVT().getRawBits());
ID.AddInteger(MLD->getRawSubclassData());		ID.AddInteger(MLD->getRawSubclassData());
ID.AddInteger(MLD->getPointerInfo().getAddrSpace());		ID.AddInteger(MLD->getPointerInfo().getAddrSpace());
break;		break;
}		}
case ISD::MSTORE: {		case ISD::MSTORE: {
▲ Show 20 Lines • Show All 6,264 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getIndexedStore(SDValue OrigStore, const SDLoc &dl,

CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
SDValue V(N, 0);		SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);		NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;		return V;
}		}

		SDValue SelectionDAG::getLoadVP(EVT VT, const SDLoc &dl, SDValue Chain,
		SDValue Ptr, SDValue Mask, SDValue VLen,
		EVT MemVT, MachineMemOperand *MMO,
		ISD::LoadExtType ExtTy) {
		SDVTList VTs = getVTList(VT, MVT::Other);
		SDValue Ops[] = { Chain, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_LOAD, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<VPLoadSDNode>(
		dl.getIROrder(), VTs, ExtTy, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPLoadSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPLoadSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		ExtTy, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,		SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,
SDValue Ptr, SDValue Mask, SDValue PassThru,		SDValue Ptr, SDValue Mask, SDValue PassThru,
EVT MemVT, MachineMemOperand *MMO,		EVT MemVT, MachineMemOperand *MMO,
ISD::LoadExtType ExtTy, bool isExpanding) {		ISD::LoadExtType ExtTy, bool isExpanding) {
SDVTList VTs = getVTList(VT, MVT::Other);		SDVTList VTs = getVTList(VT, MVT::Other);
SDValue Ops[] = { Chain, Ptr, Mask, PassThru };		SDValue Ops[] = { Chain, Ptr, Mask, PassThru };
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::MLOAD, VTs, Ops);		AddNodeIDNode(ID, ISD::MLOAD, VTs, Ops);
Show All 12 Lines	SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,

CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
SDValue V(N, 0);		SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);		NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;		return V;
}		}

		SDValue SelectionDAG::getStoreVP(SDValue Chain, const SDLoc &dl,
		SDValue Val, SDValue Ptr, SDValue Mask,
		SDValue VLen, EVT MemVT, MachineMemOperand *MMO,
		bool IsTruncating) {
		assert(Chain.getValueType() == MVT::Other &&
		"Invalid chain type");
		EVT VT = Val.getValueType();
		SDVTList VTs = getVTList(MVT::Other);
		SDValue Ops[] = { Chain, Val, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_STORE, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<VPStoreSDNode>(
		dl.getIROrder(), VTs, IsTruncating, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPStoreSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPStoreSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		IsTruncating, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getGatherVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO) {
		assert(Ops.size() == 6 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_GATHER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedGatherSDNode>(
		dl.getIROrder(), VTs, VT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPGatherSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}

		auto *N = newSDNode<VPGatherSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getScatterVP(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO) {
		assert(Ops.size() == 7 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::VP_SCATTER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedScatterSDNode>(
		dl.getIROrder(), VTs, VT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<VPScatterSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<VPScatterSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}
SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,		SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,
SDValue Val, SDValue Ptr, SDValue Mask,		SDValue Val, SDValue Ptr, SDValue Mask,
EVT MemVT, MachineMemOperand *MMO,		EVT MemVT, MachineMemOperand *MMO,
bool IsTruncating, bool IsCompressing) {		bool IsTruncating, bool IsCompressing) {
assert(Chain.getValueType() == MVT::Other &&		assert(Chain.getValueType() == MVT::Other &&
"Invalid chain type");		"Invalid chain type");
EVT VT = Val.getValueType();		EVT VT = Val.getValueType();
SDVTList VTs = getVTList(MVT::Other);		SDVTList VTs = getVTList(MVT::Other);
▲ Show 20 Lines • Show All 2,588 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 944 Lines • ▼ Show 20 Lines	private:
void visitAtomicStore(const StoreInst &I);		void visitAtomicStore(const StoreInst &I);
void visitLoadFromSwiftError(const LoadInst &I);		void visitLoadFromSwiftError(const LoadInst &I);
void visitStoreToSwiftError(const StoreInst &I);		void visitStoreToSwiftError(const StoreInst &I);

void visitInlineAsm(ImmutableCallSite CS);		void visitInlineAsm(ImmutableCallSite CS);
const char *visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);		const char *visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);		void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);		void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
		void visitExplicitVectorLengthIntrinsic(const VPIntrinsic &VPI);
		void visitCmpVP(const VPIntrinsic &I);
		void visitLoadVP(const CallInst &I);
		void visitStoreVP(const CallInst &I);
		void visitGatherVP(const CallInst &I);
		void visitScatterVP(const CallInst &I);

void visitVAStart(const CallInst &I);		void visitVAStart(const CallInst &I);
void visitVAArg(const VAArgInst &I);		void visitVAArg(const VAArgInst &I);
void visitVAEnd(const CallInst &I);		void visitVAEnd(const CallInst &I);
void visitVACopy(const CallInst &I);		void visitVACopy(const CallInst &I);
void visitStackmap(const CallInst &I);		void visitStackmap(const CallInst &I);
void visitPatchpoint(ImmutableCallSite CS,		void visitPatchpoint(ImmutableCallSite CS,
const BasicBlock *EHPadBB = nullptr);		const BasicBlock *EHPadBB = nullptr);
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 4,259 Lines • ▼ Show 20 Lines	getMachineMemOperand(MachinePointerInfo(PtrOperand),
Alignment, AAInfo);		Alignment, AAInfo);
SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,		SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,
MMO, false /* Truncating */,		MMO, false /* Truncating */,
IsCompressing);		IsCompressing);
DAG.setRoot(StoreNode);		DAG.setRoot(StoreNode);
setValue(&I, StoreNode);		setValue(&I, StoreNode);
}		}

		void SelectionDAGBuilder::visitStoreVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getVPStoreOps = [&](Value* &Ptr, Value* &Mask, Value* &Src0,
		Value * &VLen, unsigned & Alignment) {
		// llvm.masked.store.*(Src0, Ptr, Mask, VLen)
		Src0 = I.getArgOperand(0);
		Ptr = I.getArgOperand(1);
		Alignment = I.getParamAlignment(1);
		Mask = I.getArgOperand(2);
		VLen = I.getArgOperand(3);
		};

		Value PtrOperand, MaskOperand, Src0Operand, VLenOperand;
		unsigned Alignment = 0;
		getVPStoreOps(PtrOperand, MaskOperand, Src0Operand, VLenOperand, Alignment);

		SDValue Ptr = getValue(PtrOperand);
		SDValue Src0 = getValue(Src0Operand);
		SDValue Mask = getValue(MaskOperand);
		SDValue VLen = getValue(VLenOperand);

		EVT VT = Src0.getValueType();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		SDValue StoreNode = DAG.getStoreVP(getRoot(), sdl, Src0, Ptr, Mask, VLen, VT,
		MMO, false /* Truncating */);
		DAG.setRoot(StoreNode);
		setValue(&I, StoreNode);
		}

// Get a uniform base for the Gather/Scatter intrinsic.		// Get a uniform base for the Gather/Scatter intrinsic.
// The first argument of the Gather/Scatter intrinsic is a vector of pointers.		// The first argument of the Gather/Scatter intrinsic is a vector of pointers.
// We try to represent it as a base pointer + vector of indices.		// We try to represent it as a base pointer + vector of indices.
// Usually, the vector of pointers comes from a 'getelementptr' instruction.		// Usually, the vector of pointers comes from a 'getelementptr' instruction.
// The first operand of the GEP may be a single pointer or a vector of pointers		// The first operand of the GEP may be a single pointer or a vector of pointers
// Example:		// Example:
// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind		// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind
// or		// or
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	SDValue Gather = DAG.getMaskedGather(DAG.getVTList(VT, MVT::Other), VT, sdl,
Ops, MMO);		Ops, MMO);

SDValue OutChain = Gather.getValue(1);		SDValue OutChain = Gather.getValue(1);
if (!ConstantMemory)		if (!ConstantMemory)
PendingLoads.push_back(OutChain);		PendingLoads.push_back(OutChain);
setValue(&I, Gather);		setValue(&I, Gather);
}		}

		void SelectionDAGBuilder::visitGatherVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// @llvm.evl.gather.*(Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(0);
		SDValue Mask = getValue(I.getArgOperand(1));
		SDValue VLen = getValue(I.getArgOperand(2));

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		unsigned Alignment = I.getParamAlignment(0);
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		SDValue Root = DAG.getRoot();
		SDValue Base;
		SDValue Index;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, Scale, this);
		bool ConstantMemory = false;
		if (UniformBase && AA &&
		AA->pointsToConstantMemory(
		MemoryLocation(BasePtr,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo))) {
		// Do not serialize (non-volatile) loads of constant memory with anything.
		Root = DAG.getEntryNode();
		ConstantMemory = true;
		}

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(UniformBase ? BasePtr : nullptr),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { Root, Base, Index, Scale, Mask, VLen };
		SDValue Gather = DAG.getGatherVP(DAG.getVTList(VT, MVT::Other), VT, sdl, Ops, MMO);

		SDValue OutChain = Gather.getValue(1);
		if (!ConstantMemory)
		PendingLoads.push_back(OutChain);
		setValue(&I, Gather);
		}

		void SelectionDAGBuilder::visitScatterVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// llvm.evl.scatter.*(Src0, Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(1);
		SDValue Src0 = getValue(I.getArgOperand(0));
		SDValue Mask = getValue(I.getArgOperand(2));
		SDValue VLen = getValue(I.getArgOperand(3));
		EVT VT = Src0.getValueType();
		unsigned Alignment = I.getParamAlignment(1);
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		SDValue Base;
		SDValue Index;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, Scale, this);

		const Value *MemOpBasePtr = UniformBase ? BasePtr : nullptr;
		MachineMemOperand *MMO = DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(MemOpBasePtr),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { getRoot(), Src0, Base, Index, Scale, Mask, VLen };
		SDValue Scatter = DAG.getScatterVP(DAG.getVTList(MVT::Other), VT, sdl,
		Ops, MMO);
		DAG.setRoot(Scatter);
		setValue(&I, Scatter);
		}

		void SelectionDAGBuilder::visitLoadVP(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getMaskedLoadOps = [&](Value* &Ptr, Value* &Mask, Value* &VLen,
		unsigned& Alignment) {
		// @llvm.evl.load.*(Ptr, Mask, Vlen)
		Ptr = I.getArgOperand(0);
		Alignment = I.getParamAlignment(0);
		Mask = I.getArgOperand(1);
		VLen = I.getArgOperand(2);
		};

		Value PtrOperand, MaskOperand, *VLenOperand;
		unsigned Alignment;
		getMaskedLoadOps(PtrOperand, MaskOperand, VLenOperand, Alignment);

		SDValue Ptr = getValue(PtrOperand);
		SDValue VLen = getValue(VLenOperand);
		SDValue Mask = getValue(MaskOperand);

		// infer the return type
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SmallVector<EVT, 4> ValValueVTs;
		ComputeValueVTs(TLI, DAG.getDataLayout(), I.getType(), ValValueVTs);
		EVT VT = ValValueVTs[0];
		assert((ValValueVTs.size() == 1) && "splitting not implemented");

		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		// Do not serialize masked loads of constant memory with anything.
		bool AddToChain =
		!AA \|\| !AA->pointsToConstantMemory(MemoryLocation(
		PtrOperand,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo));
		SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		SDValue Load = DAG.getLoadVP(VT, sdl, InChain, Ptr, Mask, VLen, VT, MMO,
		ISD::NON_EXTLOAD);
		if (AddToChain)
		PendingLoads.push_back(Load.getValue(1));
		setValue(&I, Load);
		}

void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {		void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
AtomicOrdering SuccessOrdering = I.getSuccessOrdering();		AtomicOrdering SuccessOrdering = I.getSuccessOrdering();
AtomicOrdering FailureOrdering = I.getFailureOrdering();		AtomicOrdering FailureOrdering = I.getFailureOrdering();
SyncScope::ID SSID = I.getSyncScopeID();		SyncScope::ID SSID = I.getSyncScopeID();

SDValue InChain = getRoot();		SDValue InChain = getRoot();

▲ Show 20 Lines • Show All 1,529 Lines • ▼ Show 20 Lines	SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
case Intrinsic::experimental_constrained_maxnum:		case Intrinsic::experimental_constrained_maxnum:
case Intrinsic::experimental_constrained_minnum:		case Intrinsic::experimental_constrained_minnum:
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
return nullptr;		return nullptr;

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		case Intrinsic::vp_cmp:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmax:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmul:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_smin:
		visitExplicitVectorLengthIntrinsic(cast<VPIntrinsic>(I));
		return nullptr;

case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&		if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
TLI.isFMAFasterThanFMulAndFAdd(VT)) {		TLI.isFMAFasterThanFMulAndFAdd(VT)) {
setValue(&I, DAG.getNode(ISD::FMA, sdl,		setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
▲ Show 20 Lines • Show All 803 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(

assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

		void SelectionDAGBuilder::visitCmpVP(const VPIntrinsic &I) {
		ISD::CondCode Condition;
		CmpInst::Predicate predicate = I.getCmpPredicate();
		bool IsFP = I.getOperand(0)->getType()->isFPOrFPVectorTy();
		if (IsFP) {
		Condition = getFCmpCondCode(predicate);
		auto *FPMO = dyn_cast<FPMathOperator>(&I);
		if ((FPMO && FPMO->hasNoNaNs()) \|\| TM.Options.NoNaNsFPMath)
		Condition = getFCmpCodeWithoutNaN(Condition);

		} else {
		Condition = getICmpCondCode(predicate);
		}

		SDValue Op1 = getValue(I.getOperand(0));
		SDValue Op2 = getValue(I.getOperand(1));

		EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
		I.getType());
		setValue(&I, DAG.getSetCC(getCurSDLoc(), DestVT, Op1, Op2, Condition));
		}

		void SelectionDAGBuilder::visitExplicitVectorLengthIntrinsic(
		const VPIntrinsic & VPInst) {
		SDLoc sdl = getCurSDLoc();
		unsigned Opcode;
		switch (VPInst.getIntrinsicID()) {
		default:
		llvm_unreachable("Unforeseen intrinsic"); // Can't reach here.

		case Intrinsic::vp_load: visitLoadVP(VPInst); return;
		case Intrinsic::vp_store: visitStoreVP(VPInst); return;
		case Intrinsic::vp_gather: visitGatherVP(VPInst); return;
		case Intrinsic::vp_scatter: visitScatterVP(VPInst); return;

		case Intrinsic::vp_cmp: visitCmpVP(VPInst); return;

		case Intrinsic::vp_add: Opcode = ISD::VP_ADD; break;
		case Intrinsic::vp_sub: Opcode = ISD::VP_SUB; break;
		case Intrinsic::vp_mul: Opcode = ISD::VP_MUL; break;
		case Intrinsic::vp_udiv: Opcode = ISD::VP_UDIV; break;
		case Intrinsic::vp_sdiv: Opcode = ISD::VP_SDIV; break;
		case Intrinsic::vp_urem: Opcode = ISD::VP_UREM; break;
		case Intrinsic::vp_srem: Opcode = ISD::VP_SREM; break;

		case Intrinsic::vp_and: Opcode = ISD::VP_AND; break;
		case Intrinsic::vp_or: Opcode = ISD::VP_OR; break;
		case Intrinsic::vp_xor: Opcode = ISD::VP_XOR; break;
		case Intrinsic::vp_ashr: Opcode = ISD::VP_SRA; break;
		case Intrinsic::vp_lshr: Opcode = ISD::VP_SRL; break;
		case Intrinsic::vp_shl: Opcode = ISD::VP_SHL; break;

		case Intrinsic::vp_fneg: Opcode = ISD::VP_FNEG; break;
		case Intrinsic::vp_fadd: Opcode = ISD::VP_FADD; break;
		case Intrinsic::vp_fsub: Opcode = ISD::VP_FSUB; break;
		case Intrinsic::vp_fmul: Opcode = ISD::VP_FMUL; break;
		case Intrinsic::vp_fdiv: Opcode = ISD::VP_FDIV; break;
		case Intrinsic::vp_frem: Opcode = ISD::VP_FREM; break;

		case Intrinsic::vp_fma: Opcode = ISD::VP_FMA; break;

		case Intrinsic::vp_select: Opcode = ISD::VP_SELECT; break;
		case Intrinsic::vp_compose: Opcode = ISD::VP_COMPOSE; break;
		case Intrinsic::vp_compress: Opcode = ISD::VP_COMPRESS; break;
		case Intrinsic::vp_expand: Opcode = ISD::VP_EXPAND; break;
		case Intrinsic::vp_vshift: Opcode = ISD::VP_VSHIFT; break;

		case Intrinsic::vp_reduce_and: Opcode = ISD::VP_REDUCE_AND; break;
		case Intrinsic::vp_reduce_or: Opcode = ISD::VP_REDUCE_OR; break;
		case Intrinsic::vp_reduce_xor: Opcode = ISD::VP_REDUCE_XOR; break;
		case Intrinsic::vp_reduce_add: Opcode = ISD::VP_REDUCE_ADD; break;
		case Intrinsic::vp_reduce_mul: Opcode = ISD::VP_REDUCE_MUL; break;
		case Intrinsic::vp_reduce_fadd: Opcode = ISD::VP_REDUCE_FADD; break;
		case Intrinsic::vp_reduce_fmul: Opcode = ISD::VP_REDUCE_FMUL; break;
		case Intrinsic::vp_reduce_smax: Opcode = ISD::VP_REDUCE_SMAX; break;
		case Intrinsic::vp_reduce_smin: Opcode = ISD::VP_REDUCE_SMIN; break;
		case Intrinsic::vp_reduce_umax: Opcode = ISD::VP_REDUCE_UMAX; break;
		case Intrinsic::vp_reduce_umin: Opcode = ISD::VP_REDUCE_UMIN; break;
		}

		// TODO memory evl: SDValue Chain = getRoot();

		SmallVector<EVT, 4> ValueVTs;
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		ComputeValueVTs(TLI, DAG.getDataLayout(), VPInst.getType(), ValueVTs);
		SDVTList VTs = DAG.getVTList(ValueVTs);

		// ValueVTs.push_back(MVT::Other); // Out chain


		SDValue Result;

		switch (VPInst.getNumArgOperands()) {
		default:
		llvm_unreachable("unexpected number of arguments to evl intrinsic");
		case 3:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(VPInst.getArgOperand(0)),
		getValue(VPInst.getArgOperand(1)),
		getValue(VPInst.getArgOperand(2)) });
		break;

		case 4:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(VPInst.getArgOperand(0)),
		getValue(VPInst.getArgOperand(1)),
		getValue(VPInst.getArgOperand(2)),
		getValue(VPInst.getArgOperand(3)) });
		break;

		case 5:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(VPInst.getArgOperand(0)),
		getValue(VPInst.getArgOperand(1)),
		getValue(VPInst.getArgOperand(2)),
		getValue(VPInst.getArgOperand(3)),
		getValue(VPInst.getArgOperand(4)) });
		break;
		}

		if (Result.getNode()->getNumValues() == 2) {
		// this evl node has a chain
		SDValue OutChain = Result.getValue(1);
		DAG.setRoot(OutChain);
		SDValue VPResult = Result.getValue(0);
		setValue(&VPInst, VPResult);
		} else {
		// this is a pure node
		setValue(&VPInst, Result);
		}
		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,		SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
const BasicBlock *EHPadBB) {		const BasicBlock *EHPadBB) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
MCSymbol *BeginLabel = nullptr;		MCSymbol *BeginLabel = nullptr;

if (EHPadBB) {		if (EHPadBB) {
▲ Show 20 Lines • Show All 3,934 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 417 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_OR: return "vecreduce_or";		case ISD::VECREDUCE_OR: return "vecreduce_or";
case ISD::VECREDUCE_XOR: return "vecreduce_xor";		case ISD::VECREDUCE_XOR: return "vecreduce_xor";
case ISD::VECREDUCE_SMAX: return "vecreduce_smax";		case ISD::VECREDUCE_SMAX: return "vecreduce_smax";
case ISD::VECREDUCE_SMIN: return "vecreduce_smin";		case ISD::VECREDUCE_SMIN: return "vecreduce_smin";
case ISD::VECREDUCE_UMAX: return "vecreduce_umax";		case ISD::VECREDUCE_UMAX: return "vecreduce_umax";
case ISD::VECREDUCE_UMIN: return "vecreduce_umin";		case ISD::VECREDUCE_UMIN: return "vecreduce_umin";
case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";		case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";
case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";		case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";

		// Explicit Vector Length erxtension
		// VP Memory
		case ISD::VP_LOAD: return "vp_load";
		case ISD::VP_STORE: return "vp_store";
		case ISD::VP_GATHER: return "vp_gather";
		case ISD::VP_SCATTER: return "vp_scatter";

		// VP Unary operators
		case ISD::VP_FNEG: return "vp_fneg";

		// VP Binary operators
		case ISD::VP_ADD: return "vp_add";
		case ISD::VP_SUB: return "vp_sub";
		case ISD::VP_MUL: return "vp_mul";
		case ISD::VP_SDIV: return "vp_sdiv";
		case ISD::VP_UDIV: return "vp_udiv";
		case ISD::VP_SREM: return "vp_srem";
		case ISD::VP_UREM: return "vp_urem";
		case ISD::VP_AND: return "vp_and";
		case ISD::VP_OR: return "vp_or";
		case ISD::VP_XOR: return "vp_xor";
		case ISD::VP_SHL: return "vp_shl";
		case ISD::VP_SRA: return "vp_sra";
		case ISD::VP_SRL: return "vp_srl";
		case ISD::VP_FADD: return "vp_fadd";
		case ISD::VP_FSUB: return "vp_fsub";
		case ISD::VP_FMUL: return "vp_fmul";
		case ISD::VP_FDIV: return "vp_fdiv";
		case ISD::VP_FREM: return "vp_frem";

		// VP comparison
		case ISD::VP_SETCC: return "vp_setcc";

		// VP ternary operators
		case ISD::VP_FMA: return "vp_fma";

		// VP shuffle
		case ISD::VP_VSHIFT: return "vp_vshift";
		case ISD::VP_COMPRESS: return "vp_compress";
		case ISD::VP_EXPAND: return "vp_expand";

		case ISD::VP_COMPOSE: return "vp_compose";
		case ISD::VP_SELECT: return "vp_select";

		// VP reduction operators
		case ISD::VP_REDUCE_FADD: return "vp_reduce_fadd";
		case ISD::VP_REDUCE_FMUL: return "vp_reduce_fmul";
		case ISD::VP_REDUCE_ADD: return "vp_reduce_add";
		case ISD::VP_REDUCE_MUL: return "vp_reduce_mul";
		case ISD::VP_REDUCE_AND: return "vp_reduce_and";
		case ISD::VP_REDUCE_OR: return "vp_reduce_or";
		case ISD::VP_REDUCE_XOR: return "vp_reduce_xor";
		case ISD::VP_REDUCE_SMAX: return "vp_reduce_smax";
		case ISD::VP_REDUCE_SMIN: return "vp_reduce_smin";
		case ISD::VP_REDUCE_UMAX: return "vp_reduce_umax";
		case ISD::VP_REDUCE_UMIN: return "vp_reduce_umin";
		case ISD::VP_REDUCE_FMAX: return "vp_reduce_fmax";
		case ISD::VP_REDUCE_FMIN: return "vp_reduce_fmin";
}		}
}		}

const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {		const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {
switch (AM) {		switch (AM) {
default: return "";		default: return "";
case ISD::PRE_INC: return "<pre-inc>";		case ISD::PRE_INC: return "<pre-inc>";
case ISD::PRE_DEC: return "<pre-dec>";		case ISD::PRE_DEC: return "<pre-dec>";
▲ Show 20 Lines • Show All 514 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	#endif

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

		if (getenv("SDEBUG")) {
		CurDAG->dump();
		}

#ifndef NDEBUG		#ifndef NDEBUG
if (TTI.hasBranchDivergence())		if (TTI.hasBranchDivergence())
CurDAG->VerifyDAGDiverence();		CurDAG->VerifyDAGDiverence();
#endif		#endif

LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "		LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
▲ Show 20 Lines • Show All 3,097 Lines • Show Last 20 Lines

lib/IR/Attributes.cpp

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	std::string Attribute::getAsString(bool InAttrGrp) const {
if (hasAttribute(Attribute::ArgMemOnly))		if (hasAttribute(Attribute::ArgMemOnly))
return "argmemonly";		return "argmemonly";
if (hasAttribute(Attribute::Builtin))		if (hasAttribute(Attribute::Builtin))
return "builtin";		return "builtin";
if (hasAttribute(Attribute::ByVal))		if (hasAttribute(Attribute::ByVal))
return "byval";		return "byval";
if (hasAttribute(Attribute::Convergent))		if (hasAttribute(Attribute::Convergent))
return "convergent";		return "convergent";
		if (hasAttribute(Attribute::VectorLength))
		return "vlen";
if (hasAttribute(Attribute::SwiftError))		if (hasAttribute(Attribute::SwiftError))
return "swifterror";		return "swifterror";
if (hasAttribute(Attribute::SwiftSelf))		if (hasAttribute(Attribute::SwiftSelf))
return "swiftself";		return "swiftself";
if (hasAttribute(Attribute::InaccessibleMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOnly))
return "inaccessiblememonly";		return "inaccessiblememonly";
if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))
return "inaccessiblemem_or_argmemonly";		return "inaccessiblemem_or_argmemonly";
if (hasAttribute(Attribute::InAlloca))		if (hasAttribute(Attribute::InAlloca))
return "inalloca";		return "inalloca";
if (hasAttribute(Attribute::InlineHint))		if (hasAttribute(Attribute::InlineHint))
return "inlinehint";		return "inlinehint";
if (hasAttribute(Attribute::InReg))		if (hasAttribute(Attribute::InReg))
return "inreg";		return "inreg";
if (hasAttribute(Attribute::JumpTable))		if (hasAttribute(Attribute::JumpTable))
return "jumptable";		return "jumptable";
		if (hasAttribute(Attribute::Mask))
		return "mask";
		if (hasAttribute(Attribute::Passthru))
		return "passthru";
if (hasAttribute(Attribute::MinSize))		if (hasAttribute(Attribute::MinSize))
return "minsize";		return "minsize";
if (hasAttribute(Attribute::Naked))		if (hasAttribute(Attribute::Naked))
return "naked";		return "naked";
if (hasAttribute(Attribute::Nest))		if (hasAttribute(Attribute::Nest))
return "nest";		return "nest";
if (hasAttribute(Attribute::NoAlias))		if (hasAttribute(Attribute::NoAlias))
return "noalias";		return "noalias";
▲ Show 20 Lines • Show All 1,457 Lines • Show Last 20 Lines

lib/IR/CMakeLists.txt

Show All 40 Lines	add_llvm_library(LLVMCore
ModuleSummaryIndex.cpp		ModuleSummaryIndex.cpp
Operator.cpp		Operator.cpp
OptBisect.cpp		OptBisect.cpp
Pass.cpp		Pass.cpp
PassInstrumentation.cpp		PassInstrumentation.cpp
PassManager.cpp		PassManager.cpp
PassRegistry.cpp		PassRegistry.cpp
PassTimingInfo.cpp		PassTimingInfo.cpp
		PredicatedInst.cpp
		ProfileSummary.cpp
RemarkStreamer.cpp		RemarkStreamer.cpp
SafepointIRVerifier.cpp		SafepointIRVerifier.cpp
ProfileSummary.cpp
Statepoint.cpp		Statepoint.cpp
Type.cpp		Type.cpp
TypeFinder.cpp		TypeFinder.cpp
Use.cpp		Use.cpp
User.cpp		User.cpp
		VPBuilder.cpp
Value.cpp		Value.cpp
ValueSymbolTable.cpp		ValueSymbolTable.cpp
Verifier.cpp		Verifier.cpp

ADDITIONAL_HEADER_DIRS		ADDITIONAL_HEADER_DIRS
${LLVM_MAIN_INCLUDE_DIR}/llvm/IR		${LLVM_MAIN_INCLUDE_DIR}/llvm/IR

DEPENDS		DEPENDS
intrinsics_gen		intrinsics_gen
)		)

lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	Value *InstrProfIncrementInst::getStep() const {
if (InstrProfIncrementInstStep::classof(this)) {		if (InstrProfIncrementInstStep::classof(this)) {
return const_cast<Value *>(getArgOperand(4));		return const_cast<Value *>(getArgOperand(4));
}		}
const Module *M = getModule();		const Module *M = getModule();
LLVMContext &Context = M->getContext();		LLVMContext &Context = M->getContext();
return ConstantInt::get(Type::getInt64Ty(Context), 1);		return ConstantInt::get(Type::getInt64Ty(Context), 1);
}		}

ConstrainedFPIntrinsic::RoundingMode		static
		RoundingMode
		DecodeRoundingMode(StringRef RoundingArg) {
		// For dynamic rounding mode, we use round to nearest but we will set the
		// 'exact' SDNodeFlag so that the value will not be rounded.
		return StringSwitch<RoundingMode>(RoundingArg)
		.Case("round.dynamic", RoundingMode::rmDynamic)
		.Case("round.tonearest", RoundingMode::rmToNearest)
		.Case("round.downward", RoundingMode::rmDownward)
		.Case("round.upward", RoundingMode::rmUpward)
		.Case("round.towardzero", RoundingMode::rmTowardZero)
		.Default(RoundingMode::rmInvalid);
		}

		static
		ExceptionBehavior
		DecodeExceptionBehavior(StringRef ExceptionArg) {
		return StringSwitch<ExceptionBehavior>(ExceptionArg)
		.Case("fpexcept.ignore", ExceptionBehavior::ebIgnore)
		.Case("fpexcept.maytrap", ExceptionBehavior::ebMayTrap)
		.Case("fpexcept.strict", ExceptionBehavior::ebStrict)
		.Default(ExceptionBehavior::ebInvalid);
		}

		RoundingMode
ConstrainedFPIntrinsic::getRoundingMode() const {		ConstrainedFPIntrinsic::getRoundingMode() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 2 && "underflow");
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 2))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return rmInvalid;		return RoundingMode::rmInvalid;
StringRef RoundingArg = cast<MDString>(MD)->getString();		StringRef RoundingArg = cast<MDString>(MD)->getString();
		return DecodeRoundingMode(RoundingArg);
// For dynamic rounding mode, we use round to nearest but we will set the
// 'exact' SDNodeFlag so that the value will not be rounded.
return StringSwitch<RoundingMode>(RoundingArg)
.Case("round.dynamic", rmDynamic)
.Case("round.tonearest", rmToNearest)
.Case("round.downward", rmDownward)
.Case("round.upward", rmUpward)
.Case("round.towardzero", rmTowardZero)
.Default(rmInvalid);
}		}

ConstrainedFPIntrinsic::ExceptionBehavior		ExceptionBehavior
ConstrainedFPIntrinsic::getExceptionBehavior() const {		ConstrainedFPIntrinsic::getExceptionBehavior() const {
unsigned NumOperands = getNumArgOperands();		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 1 && "underflow");
Metadata *MD =		Metadata *MD =
dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 1))->getMetadata();
if (!MD \|\| !isa<MDString>(MD))		if (!MD \|\| !isa<MDString>(MD))
return ebInvalid;		return ExceptionBehavior::ebInvalid;
StringRef ExceptionArg = cast<MDString>(MD)->getString();		StringRef ExceptionArg = cast<MDString>(MD)->getString();
return StringSwitch<ExceptionBehavior>(ExceptionArg)		return DecodeExceptionBehavior(ExceptionArg);
.Case("fpexcept.ignore", ebIgnore)		}
.Case("fpexcept.maytrap", ebMayTrap)
.Case("fpexcept.strict", ebStrict)		CmpInst::Predicate
.Default(ebInvalid);		VPIntrinsic::getCmpPredicate() const {
		return static_cast<CmpInst::Predicate>(cast<ConstantInt>(getArgOperand(4))->getZExtValue());
		}

		RoundingMode
		VPIntrinsic::getRoundingMode() const {
		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 4 && "underflow");
		Metadata *MD =
		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 4))->getMetadata();
		if (!MD \|\| !isa<MDString>(MD))
		return RoundingMode::rmInvalid;
		StringRef RoundingArg = cast<MDString>(MD)->getString();
		return DecodeRoundingMode(RoundingArg);
		}

		ExceptionBehavior
		VPIntrinsic::getExceptionBehavior() const {
		unsigned NumOperands = getNumArgOperands();
		assert(NumOperands >= 3 && "underflow");
		Metadata *MD =
		dyn_cast<MetadataAsValue>(getArgOperand(NumOperands - 3))->getMetadata();
		if (!MD \|\| !isa<MDString>(MD))
		return ExceptionBehavior::ebInvalid;
		StringRef ExceptionArg = cast<MDString>(MD)->getString();
		return DecodeExceptionBehavior(ExceptionArg);
		}

		bool VPIntrinsic::isUnaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::vp_fneg:
		case Intrinsic::vp_constrained_sin:
		case Intrinsic::vp_constrained_cos:
		case Intrinsic::vp_constrained_exp:
		case Intrinsic::vp_constrained_exp2:
		case Intrinsic::vp_constrained_log:
		case Intrinsic::vp_constrained_log10:
		case Intrinsic::vp_constrained_log2:
		case Intrinsic::vp_constrained_sqrt:
		case Intrinsic::vp_constrained_ceil:
		case Intrinsic::vp_constrained_floor:
		case Intrinsic::vp_constrained_round:
		case Intrinsic::vp_constrained_trunc:
		case Intrinsic::vp_constrained_rint:
		case Intrinsic::vp_constrained_nearbyint:
		return true;
		}
		}

		Value*
		VPIntrinsic::getMask() const {
		int offset = 0;
		if (isConstrainedOp()) offset += 2; // skip rounding, exception args

		if (isBinaryOp()) { return getArgOperand(offset + 2); }
		else if (isTernaryOp()) { return getArgOperand(offset + 3); }
		else if (isUnaryOp()) { return getArgOperand(offset + 1); }
		else return nullptr;
		}

		Value*
		VPIntrinsic::getVectorLength() const {
		int offset = 0;
		if (isConstrainedOp()) offset += 2; // skip rounding, exception args

		if (isBinaryOp()) { return getArgOperand(offset + 3); }
		else if (isTernaryOp()) { return getArgOperand(offset + 4); }
		else if (isUnaryOp()) { return getArgOperand(offset + 2); }
		else return nullptr;
		}

		bool VPIntrinsic::isReductionOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:

		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:

		return true;
		}
		}

		bool VPIntrinsic::isConstrainedOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;

		case Intrinsic::vp_constrained_fadd:
		case Intrinsic::vp_constrained_fsub:
		case Intrinsic::vp_constrained_fmul:
		case Intrinsic::vp_constrained_fdiv:
		case Intrinsic::vp_constrained_frem:
		case Intrinsic::vp_constrained_fma:
		case Intrinsic::vp_constrained_sqrt:
		case Intrinsic::vp_constrained_pow:
		case Intrinsic::vp_constrained_powi:
		case Intrinsic::vp_constrained_sin:
		case Intrinsic::vp_constrained_cos:
		case Intrinsic::vp_constrained_exp:
		case Intrinsic::vp_constrained_exp2:
		case Intrinsic::vp_constrained_log:
		case Intrinsic::vp_constrained_log10:
		case Intrinsic::vp_constrained_log2:
		case Intrinsic::vp_constrained_rint:
		case Intrinsic::vp_constrained_nearbyint:
		case Intrinsic::vp_constrained_maxnum:
		case Intrinsic::vp_constrained_minnum:
		case Intrinsic::vp_constrained_ceil:
		case Intrinsic::vp_constrained_floor:
		case Intrinsic::vp_constrained_round:
		case Intrinsic::vp_constrained_trunc:
		return true;
		}
		}

		bool VPIntrinsic::isBinaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:
		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_smax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_umin:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmax:
		case Intrinsic::vp_reduce_fmin:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		case Intrinsic::vp_constrained_fadd:
		case Intrinsic::vp_constrained_fsub:
		case Intrinsic::vp_constrained_fmul:
		case Intrinsic::vp_constrained_fdiv:
		case Intrinsic::vp_constrained_frem:
		case Intrinsic::vp_constrained_pow:
		case Intrinsic::vp_constrained_powi:
		case Intrinsic::vp_constrained_maxnum:
		case Intrinsic::vp_constrained_minnum:

		return true;
}		}
		}

		bool VPIntrinsic::isTernaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::vp_compose:
		case Intrinsic::vp_select:
		case Intrinsic::vp_fma:
		case Intrinsic::experimental_constrained_fma:
		return true;
		}
		}

		VPIntrinsic::VPIntrinsicDesc
		VPIntrinsic::GetVPDescForIntrinsic(unsigned IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return VPIntrinsicDesc{Intrinsic::not_intrinsic, TypeTokenVec(), -1, -1};

		// llvm.experimental.constrained.*
		case Intrinsic::experimental_constrained_cos: return VPIntrinsicDesc{ Intrinsic::vp_constrained_cos, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_sin: return VPIntrinsicDesc{ Intrinsic::vp_constrained_sin, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_exp: return VPIntrinsicDesc{ Intrinsic::vp_constrained_exp, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_exp2: return VPIntrinsicDesc{ Intrinsic::vp_constrained_exp2, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_log: return VPIntrinsicDesc{ Intrinsic::vp_constrained_log, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_log2: return VPIntrinsicDesc{ Intrinsic::vp_constrained_log2, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_log10: return VPIntrinsicDesc{ Intrinsic::vp_constrained_log10, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_sqrt: return VPIntrinsicDesc{ Intrinsic::vp_constrained_sqrt, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_ceil: return VPIntrinsicDesc{ Intrinsic::vp_constrained_ceil, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_floor: return VPIntrinsicDesc{ Intrinsic::vp_constrained_floor, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_round: return VPIntrinsicDesc{ Intrinsic::vp_constrained_round, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_trunc: return VPIntrinsicDesc{ Intrinsic::vp_constrained_trunc, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_rint: return VPIntrinsicDesc{ Intrinsic::vp_constrained_rint, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;
		case Intrinsic::experimental_constrained_nearbyint: return VPIntrinsicDesc{ Intrinsic::vp_constrained_nearbyint, TypeTokenVec{VPTypeToken::Vector}, 3, 4}; break;

		case Intrinsic::experimental_constrained_fadd: return VPIntrinsicDesc{ Intrinsic::vp_constrained_fadd, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_fsub: return VPIntrinsicDesc{ Intrinsic::vp_constrained_fsub, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_fmul: return VPIntrinsicDesc{ Intrinsic::vp_constrained_fmul, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_fdiv: return VPIntrinsicDesc{ Intrinsic::vp_constrained_fdiv, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_frem: return VPIntrinsicDesc{ Intrinsic::vp_constrained_frem, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_pow: return VPIntrinsicDesc{ Intrinsic::vp_constrained_pow, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_powi: return VPIntrinsicDesc{ Intrinsic::vp_constrained_powi, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_maxnum: return VPIntrinsicDesc{ Intrinsic::vp_constrained_maxnum, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;
		case Intrinsic::experimental_constrained_minnum: return VPIntrinsicDesc{ Intrinsic::vp_constrained_minnum, TypeTokenVec{VPTypeToken::Vector}, 4, 5}; break;

		case Intrinsic::experimental_constrained_fma: return VPIntrinsicDesc{ Intrinsic::vp_constrained_fma, TypeTokenVec{VPTypeToken::Vector}, 5, 6}; break;
		}
		}

		VPIntrinsic::VPIntrinsicDesc
		VPIntrinsic::GetVPIntrinsicDesc(unsigned OC) {
		switch (OC) {
		default:
		return VPIntrinsicDesc{Intrinsic::not_intrinsic, TypeTokenVec(), -1, -1};

		// fp unary
		case Instruction::FNeg: return VPIntrinsicDesc{ Intrinsic::vp_fneg, TypeTokenVec{VPTypeToken::Vector}, 1, 2}; break;

		// fp binary
		case Instruction::FAdd: return VPIntrinsicDesc{ Intrinsic::vp_fadd, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::FSub: return VPIntrinsicDesc{ Intrinsic::vp_fsub, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::FMul: return VPIntrinsicDesc{ Intrinsic::vp_fmul, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::FDiv: return VPIntrinsicDesc{ Intrinsic::vp_fdiv, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::FRem: return VPIntrinsicDesc{ Intrinsic::vp_frem, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;

		// sign-oblivious int
		case Instruction::Add: return VPIntrinsicDesc{ Intrinsic::vp_add, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::Sub: return VPIntrinsicDesc{ Intrinsic::vp_sub, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::Mul: return VPIntrinsicDesc{ Intrinsic::vp_mul, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;

		// signed/unsigned int
		case Instruction::SDiv: return VPIntrinsicDesc{ Intrinsic::vp_sdiv, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::UDiv: return VPIntrinsicDesc{ Intrinsic::vp_udiv, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::SRem: return VPIntrinsicDesc{ Intrinsic::vp_srem, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::URem: return VPIntrinsicDesc{ Intrinsic::vp_urem, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;

		// logical
		case Instruction::Or: return VPIntrinsicDesc{ Intrinsic::vp_or, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::And: return VPIntrinsicDesc{ Intrinsic::vp_and, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::Xor: return VPIntrinsicDesc{ Intrinsic::vp_xor, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;

		case Instruction::LShr: return VPIntrinsicDesc{ Intrinsic::vp_lshr, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::AShr: return VPIntrinsicDesc{ Intrinsic::vp_ashr, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;
		case Instruction::Shl: return VPIntrinsicDesc{ Intrinsic::vp_shl, TypeTokenVec{VPTypeToken::Vector}, 2, 3}; break;

		// comparison
		case Instruction::ICmp:
		case Instruction::FCmp:
		return VPIntrinsicDesc{ Intrinsic::vp_cmp, TypeTokenVec{VPTypeToken::Mask, VPTypeToken::Vector}, 2, 3}; break;
		}
		}

		VPIntrinsic::ShortTypeVec
		VPIntrinsic::EncodeTypeTokens(VPIntrinsic::TypeTokenVec TTVec, Type & VectorTy, Type & ScalarTy) {
		ShortTypeVec STV;

		for (auto Token : TTVec) {
		switch (Token) {
		default:
		llvm_unreachable("unsupported token"); // unsupported VPTypeToken

		case VPIntrinsic::VPTypeToken::Scalar: STV.push_back(&ScalarTy); break;
		case VPIntrinsic::VPTypeToken::Vector: STV.push_back(&VectorTy); break;
		case VPIntrinsic::VPTypeToken::Mask:
		auto NumElems = VectorTy.getVectorNumElements();
		auto MaskTy = VectorType::get(Type::getInt1Ty(VectorTy.getContext()), NumElems);
		STV.push_back(MaskTy); break;
		}
		}

		return STV;
		}


bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
Show All 24 Lines

lib/IR/PredicatedInst.cpp

This file was added.

				#include <llvm/IR/PredicatedInst.h>
				#include <llvm/IR/Instruction.h>
				#include <llvm/IR/Instructions.h>
				#include <llvm/IR/InstrTypes.h>
				#include <llvm/IR/IntrinsicInst.h>

				namespace {
				using namespace llvm;
				using ShortValueVec = SmallVector<Value*, 4>;
				}

				namespace llvm {

				void
				PredicatedOperator::copyIRFlags(const Value * V, bool IncludeWrapFlags) {
				auto * I = dyn_cast<Instruction>(this);
				if (I) I->copyIRFlags(V, IncludeWrapFlags);
				}

				Instruction*
				PredicatedBinaryOperator::Create(Module * Mod,
				Value Mask, Value VectorLen,
				Instruction::BinaryOps Opc,
				Value V1, Value V2,
				const Twine &Name,
				BasicBlock * InsertAtEnd,
				Instruction * InsertBefore) {
				assert(!(InsertAtEnd && InsertBefore));

				auto evlDesc = VPIntrinsic::GetVPIntrinsicDesc(Opc);

				if ((!Mod \|\|
				(!Mask && !VectorLen)) \|\|
				evlDesc.ID == Intrinsic::not_intrinsic) {
				if (InsertAtEnd) {
				return BinaryOperator::Create(Opc, V1, V2, Name, InsertAtEnd);
				} else {
				return BinaryOperator::Create(Opc, V1, V2, Name, InsertBefore);
				}
				}

				assert(Mod && "Need a module to emit VP Intrinsics");

				// Fetch the VP intrinsic
				auto & VecTy = cast<VectorType>(*V1->getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(Mod, evlDesc.ID, VPIntrinsic::EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// Materialize the Call
				ShortValueVec Args{V1, V2, Mask, VectorLen};

				if (InsertAtEnd) {
				return CallInst::Create(Func, {V1, V2, Mask, VectorLen}, Name, InsertAtEnd);
				} else {
				return CallInst::Create(Func, {V1, V2, Mask, VectorLen}, Name, InsertBefore);
				}
				}

				}

lib/IR/VPBuilder.cpp

This file was added.

				#include <llvm/IR/VPBuilder.h>
				#include <llvm/IR/Intrinsics.h>
				#include <llvm/IR/Instructions.h>
				#include <llvm/IR/PredicatedInst.h>
				#include <llvm/ADT/SmallVector.h>

				namespace {
				using namespace llvm;
				using ShortTypeVec = VPIntrinsic::ShortTypeVec;
				using ShortValueVec = SmallVector<Value*, 4>;
				}

				namespace llvm {

				Module &
				VPBuilder::getModule() const {
				return *Builder.GetInsertBlock()->getParent()->getParent();
				}

				Value&
				VPBuilder::GetMaskForType(VectorType & VecTy) {
				if (Mask) return *Mask;

				auto * boolTy = Builder.getInt1Ty();
				auto * maskTy = VectorType::get(boolTy, StaticVectorLength);
				return *ConstantInt::getAllOnesValue(maskTy);
				}

				Value&
				VPBuilder::GetEVLForType(VectorType & VecTy) {
				if (ExplicitVectorLength) return *ExplicitVectorLength;

				auto * intTy = Builder.getInt32Ty();
				return *ConstantInt::get(intTy, StaticVectorLength);
				}

				Value*
				VPBuilder::CreateVectorCopy(Instruction & Inst, ValArray VecOpArray) {

				auto oc = Inst.getOpcode();

				auto evlDesc = VPIntrinsic::GetVPIntrinsicDesc(oc);
				if (evlDesc.ID == Intrinsic::not_intrinsic) {
				return nullptr;
				}

				if ((oc <= Instruction::BinaryOpsEnd) &&
				(oc >= Instruction::BinaryOpsBegin)) {

				assert(VecOpArray.size() == 2);
				Value & FirstOp = *VecOpArray[0];
				Value & SndOp = *VecOpArray[1];

				// Fetch the VP intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				auto & VPCall =
				cast<Instruction>(*PredicatedBinaryOperator::Create(&getModule(), &GetMaskForType(VecTy), &GetEVLForType(VecTy), static_cast<Instruction::BinaryOps>(oc), &FirstOp, &SndOp));
				Builder.Insert(&VPCall);

				// transfer fast math flags
				if (isa<FPMathOperator>(Inst)) {
				VPCall.copyFastMathFlags(Inst.getFastMathFlags());
				}

				return &VPCall;
				}

				if ((oc <= Instruction::UnaryOpsBegin) &&
				(oc >= Instruction::UnaryOpsEnd)) {
				assert(VecOpArray.size() == 1);
				Value & FirstOp = *VecOpArray[0];

				// Fetch the VP intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, VPIntrinsic::EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 1) && (evlDesc.EVLPos == 2));

				// Materialize the Call
				ShortValueVec Args{&FirstOp, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};

				auto & VPCall = *Builder.CreateCall(Func, Args);

				// transfer fast math flags
				if (isa<FPMathOperator>(Inst)) {
				cast<CallInst>(VPCall).copyFastMathFlags(Inst.getFastMathFlags());
				}

				return &VPCall;
				}

				switch (oc) {
				default:
				return nullptr;

				case Instruction::FCmp:
				case Instruction::ICmp: {
				assert(VecOpArray.size() == 2);
				Value & FirstOp = *VecOpArray[0];
				Value & SndOp = *VecOpArray[1];

				// Fetch the VP intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, VPIntrinsic::EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// encode comparison predicate as MD
				uint8_t RawPred = cast<CmpInst>(Inst).getPredicate();
				auto Int8Ty = Builder.getInt8Ty();
				auto PredArg = ConstantInt::get(Int8Ty, RawPred, false);

				// Materialize the Call
				ShortValueVec Args{&FirstOp, &SndOp, &GetMaskForType(VecTy), &GetEVLForType(VecTy), PredArg};

				return Builder.CreateCall(Func, Args);
				}

				case Instruction::Select: {
				assert(VecOpArray.size() == 2);
				Value & MaskOp = *VecOpArray[0];
				Value & OnTrueOp = *VecOpArray[1];
				Value & OnFalseOp = *VecOpArray[2];

				// Fetch the VP intrinsic
				auto & VecTy = cast<VectorType>(*OnTrueOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();

				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, VPIntrinsic::EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// Materialize the Call
				ShortValueVec Args{&OnTrueOp, &OnFalseOp, &MaskOp, &GetEVLForType(VecTy)};

				return Builder.CreateCall(Func, Args);
				}
				}
				}

				VectorType&
				VPBuilder::getVectorType(Type &ElementTy) {
				return *VectorType::get(&ElementTy, StaticVectorLength);
				}

				Value&
				VPBuilder::CreateContiguousStore(Value & Val, Value & Pointer, unsigned Alignment) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * StoreFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_store, {Val.getType(), Pointer.getType()});
				ShortValueVec Args{&Val, &Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &StoreCall = *Builder.CreateCall(StoreFunc, Args);
				if (Alignment) StoreCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return StoreCall;
				}

				Value&
				VPBuilder::CreateContiguousLoad(Value & Pointer, unsigned Alignment) {
				auto & PointerTy = cast<PointerType>(*Pointer.getType());
				auto & VecTy = getVectorType(*PointerTy.getPointerElementType());

				auto * LoadFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_load, {&VecTy, &PointerTy});
				ShortValueVec Args{&Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &LoadCall= *Builder.CreateCall(LoadFunc, Args);
				if (Alignment) LoadCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return LoadCall;
				}

				Value&
				VPBuilder::CreateScatter(Value & Val, Value & PointerVec, unsigned Alignment) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * ScatterFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_scatter, {Val.getType(), PointerVec.getType()});
				ShortValueVec Args{&Val, &PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &ScatterCall = *Builder.CreateCall(ScatterFunc, Args);
				if (Alignment) ScatterCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return ScatterCall;
				}

				Value&
				VPBuilder::CreateGather(Value & PointerVec, unsigned Alignment) {
				auto & PointerVecTy = cast<VectorType>(*PointerVec.getType());
				auto & ElemTy = cast<PointerType>(PointerVecTy.getVectorElementType()).getPointerElementType();
				auto & VecTy = *VectorType::get(&ElemTy, PointerVecTy.getNumElements());
				auto * GatherFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::vp_gather, {&VecTy, &PointerVecTy});

				ShortValueVec Args{&PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				CallInst &GatherCall = *Builder.CreateCall(GatherFunc, Args);
				if (Alignment) GatherCall.addParamAttr(1, Attribute::getWithAlignment(getContext(), Alignment));
				return GatherCall;
				}

				} // namespace llvm

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 466 Lines • ▼ Show 20 Lines	#include "llvm/IR/Metadata.def"
void visitSwitchInst(SwitchInst &SI);		void visitSwitchInst(SwitchInst &SI);
void visitIndirectBrInst(IndirectBrInst &BI);		void visitIndirectBrInst(IndirectBrInst &BI);
void visitCallBrInst(CallBrInst &CBI);		void visitCallBrInst(CallBrInst &CBI);
void visitSelectInst(SelectInst &SI);		void visitSelectInst(SelectInst &SI);
void visitUserOp1(Instruction &I);		void visitUserOp1(Instruction &I);
void visitUserOp2(Instruction &I) { visitUserOp1(I); }		void visitUserOp2(Instruction &I) { visitUserOp1(I); }
void visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call);		void visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call);
void visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI);		void visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI);
		void visitVPIntrinsic(VPIntrinsic &FPI);
void visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII);		void visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII);
void visitDbgLabelIntrinsic(StringRef Kind, DbgLabelInst &DLI);		void visitDbgLabelIntrinsic(StringRef Kind, DbgLabelInst &DLI);
void visitAtomicCmpXchgInst(AtomicCmpXchgInst &CXI);		void visitAtomicCmpXchgInst(AtomicCmpXchgInst &CXI);
void visitAtomicRMWInst(AtomicRMWInst &RMWI);		void visitAtomicRMWInst(AtomicRMWInst &RMWI);
void visitFenceInst(FenceInst &FI);		void visitFenceInst(FenceInst &FI);
void visitAllocaInst(AllocaInst &AI);		void visitAllocaInst(AllocaInst &AI);
void visitExtractValueInst(ExtractValueInst &EVI);		void visitExtractValueInst(ExtractValueInst &EVI);
void visitInsertValueInst(InsertValueInst &IVI);		void visitInsertValueInst(InsertValueInst &IVI);
▲ Show 20 Lines • Show All 1,177 Lines • ▼ Show 20 Lines

// Check parameter attributes against a function type.		// Check parameter attributes against a function type.
// The value V is printed in error messages.		// The value V is printed in error messages.
void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,		void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,
const Value *V, bool IsIntrinsic) {		const Value *V, bool IsIntrinsic) {
if (Attrs.isEmpty())		if (Attrs.isEmpty())
return;		return;

		bool SawMask = false;
bool SawNest = false;		bool SawNest = false;
		bool SawPassthru = false;
bool SawReturned = false;		bool SawReturned = false;
bool SawSRet = false;		bool SawSRet = false;
bool SawSwiftSelf = false;		bool SawSwiftSelf = false;
bool SawSwiftError = false;		bool SawSwiftError = false;
		bool SawVectorLength = false;

// Verify return value attributes.		// Verify return value attributes.
AttributeSet RetAttrs = Attrs.getRetAttributes();		AttributeSet RetAttrs = Attrs.getRetAttributes();
Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&		Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&
!RetAttrs.hasAttribute(Attribute::Nest) &&		!RetAttrs.hasAttribute(Attribute::Nest) &&
!RetAttrs.hasAttribute(Attribute::StructRet) &&		!RetAttrs.hasAttribute(Attribute::StructRet) &&
!RetAttrs.hasAttribute(Attribute::NoCapture) &&		!RetAttrs.hasAttribute(Attribute::NoCapture) &&
!RetAttrs.hasAttribute(Attribute::Returned) &&		!RetAttrs.hasAttribute(Attribute::Returned) &&
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = FT->getNumParams(); i != e; ++i) {
}		}

if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {		if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {
Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",		Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",
V);		V);
SawSwiftError = true;		SawSwiftError = true;
}		}

		if (ArgAttrs.hasAttribute(Attribute::VectorLength)) {
		Assert(!SawVectorLength, "Cannot have multiple 'vlen' parameters!",
		V);
		SawVectorLength = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Passthru)) {
		Assert(!SawPassthru, "Cannot have multiple 'passthru' parameters!",
		V);
		SawPassthru = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Mask)) {
		Assert(!SawMask, "Cannot have multiple 'mask' parameters!",
		V);
		SawMask = true;
		}

if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {		if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {
Assert(i == FT->getNumParams() - 1,		Assert(i == FT->getNumParams() - 1,
"inalloca isn't on the last parameter!", V);		"inalloca isn't on the last parameter!", V);
}		}
}		}

		Assert(!SawPassthru \|\| SawMask,
		"Cannot have 'passthru' parameter without 'mask' parameter!", V);

if (!Attrs.hasAttributes(AttributeList::FunctionIndex))		if (!Attrs.hasAttributes(AttributeList::FunctionIndex))
return;		return;

verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);		verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);

Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&		Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&
Attrs.hasFnAttribute(Attribute::ReadOnly)),		Attrs.hasFnAttribute(Attribute::ReadOnly)),
"Attributes 'readnone and readonly' are incompatible!", V);		"Attributes 'readnone and readonly' are incompatible!", V);
▲ Show 20 Lines • Show All 1,323 Lines • ▼ Show 20 Lines	Assert(
&II);		&II);

visitTerminator(II);		visitTerminator(II);
}		}

/// visitUnaryOperator - Check the argument to the unary operator.		/// visitUnaryOperator - Check the argument to the unary operator.
///		///
void Verifier::visitUnaryOperator(UnaryOperator &U) {		void Verifier::visitUnaryOperator(UnaryOperator &U) {
Assert(U.getType() == U.getOperand(0)->getType(),		Assert(U.getType() == U.getOperand(0)->getType(),
"Unary operators must have same type for"		"Unary operators must have same type for"
"operands and result!",		"operands and result!",
&U);		&U);

switch (U.getOpcode()) {		switch (U.getOpcode()) {
// Check that floating-point arithmetic operators are only used with		// Check that floating-point arithmetic operators are only used with
// floating-point operands.		// floating-point operands.
case Instruction::FNeg:		case Instruction::FNeg:
▲ Show 20 Lines • Show All 1,129 Lines • ▼ Show 20 Lines	void Verifier::visitIntrinsicCall(Intrinsic::ID ID, CallBase &Call) {
case Intrinsic::experimental_constrained_maxnum:		case Intrinsic::experimental_constrained_maxnum:
case Intrinsic::experimental_constrained_minnum:		case Intrinsic::experimental_constrained_minnum:
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(Call));
break;		break;

		case Intrinsic::vp_cmp:

		case Intrinsic::vp_and:
		case Intrinsic::vp_or:
		case Intrinsic::vp_xor:
		case Intrinsic::vp_ashr:
		case Intrinsic::vp_lshr:
		case Intrinsic::vp_shl:

		case Intrinsic::vp_select:
		case Intrinsic::vp_compose:
		case Intrinsic::vp_compress:
		case Intrinsic::vp_expand:
		case Intrinsic::vp_vshift:

		case Intrinsic::vp_load:
		case Intrinsic::vp_store:
		case Intrinsic::vp_gather:
		case Intrinsic::vp_scatter:

		case Intrinsic::vp_fneg:

		case Intrinsic::vp_fadd:
		case Intrinsic::vp_fsub:
		case Intrinsic::vp_fmul:
		case Intrinsic::vp_fdiv:
		case Intrinsic::vp_frem:

		case Intrinsic::vp_fma:

		case Intrinsic::vp_add:
		case Intrinsic::vp_sub:
		case Intrinsic::vp_mul:
		case Intrinsic::vp_udiv:
		case Intrinsic::vp_sdiv:
		case Intrinsic::vp_urem:
		case Intrinsic::vp_srem:

		case Intrinsic::vp_reduce_add:
		case Intrinsic::vp_reduce_mul:
		case Intrinsic::vp_reduce_umin:
		case Intrinsic::vp_reduce_umax:
		case Intrinsic::vp_reduce_smin:
		case Intrinsic::vp_reduce_smax:

		case Intrinsic::vp_reduce_and:
		case Intrinsic::vp_reduce_or:
		case Intrinsic::vp_reduce_xor:

		case Intrinsic::vp_reduce_fadd:
		case Intrinsic::vp_reduce_fmul:
		case Intrinsic::vp_reduce_fmin:
		case Intrinsic::vp_reduce_fmax:

		case Intrinsic::vp_constrained_fadd:
		case Intrinsic::vp_constrained_fsub:
		case Intrinsic::vp_constrained_fmul:
		case Intrinsic::vp_constrained_fdiv:
		case Intrinsic::vp_constrained_frem:
		case Intrinsic::vp_constrained_fma:
		case Intrinsic::vp_constrained_sqrt:
		case Intrinsic::vp_constrained_pow:
		case Intrinsic::vp_constrained_powi:
		case Intrinsic::vp_constrained_sin:
		case Intrinsic::vp_constrained_cos:
		case Intrinsic::vp_constrained_exp:
		case Intrinsic::vp_constrained_exp2:
		case Intrinsic::vp_constrained_log:
		case Intrinsic::vp_constrained_log10:
		case Intrinsic::vp_constrained_log2:
		case Intrinsic::vp_constrained_rint:
		case Intrinsic::vp_constrained_nearbyint:
		case Intrinsic::vp_constrained_maxnum:
		case Intrinsic::vp_constrained_minnum:
		case Intrinsic::vp_constrained_ceil:
		case Intrinsic::vp_constrained_floor:
		case Intrinsic::vp_constrained_round:
		case Intrinsic::vp_constrained_trunc:
		visitVPIntrinsic(cast<VPIntrinsic>(Call));
		break;

case Intrinsic::dbg_declare: // llvm.dbg.declare		case Intrinsic::dbg_declare: // llvm.dbg.declare
Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),		Assert(isa<MetadataAsValue>(Call.getArgOperand(0)),
"invalid llvm.dbg.declare intrinsic call 1", Call);		"invalid llvm.dbg.declare intrinsic call 1", Call);
visitDbgIntrinsic("declare", cast<DbgVariableIntrinsic>(Call));		visitDbgIntrinsic("declare", cast<DbgVariableIntrinsic>(Call));
break;		break;
case Intrinsic::dbg_addr: // llvm.dbg.addr		case Intrinsic::dbg_addr: // llvm.dbg.addr
visitDbgIntrinsic("addr", cast<DbgVariableIntrinsic>(Call));		visitDbgIntrinsic("addr", cast<DbgVariableIntrinsic>(Call));
break;		break;
▲ Show 20 Lines • Show All 395 Lines • ▼ Show 20 Lines	static DISubprogram getSubprogram(Metadata LocalScope) {
if (auto *LB = dyn_cast<DILexicalBlockBase>(LocalScope))		if (auto *LB = dyn_cast<DILexicalBlockBase>(LocalScope))
return getSubprogram(LB->getRawScope());		return getSubprogram(LB->getRawScope());

// Just return null; broken scope chains are checked elsewhere.		// Just return null; broken scope chains are checked elsewhere.
assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");		assert(!isa<DILocalScope>(LocalScope) && "Unknown type of local scope");
return nullptr;		return nullptr;
}		}

		void Verifier::visitVPIntrinsic(VPIntrinsic &VPI) {
		if (VPI.isConstrainedOp()) {
		Assert(VPI.getExceptionBehavior() != ExceptionBehavior::ebInvalid,
		"invalid exception behavior argument", &VPI);
		Assert(VPI.getRoundingMode() != RoundingMode::rmInvalid,
		"invalid rounding mode argument", &VPI);
		}
		}

void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {		void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
unsigned NumOperands = FPI.getNumArgOperands();		unsigned NumOperands = FPI.getNumArgOperands();
bool HasExceptionMD = false;		bool HasExceptionMD = false;
bool HasRoundingMD = false;		bool HasRoundingMD = false;
switch (FPI.getIntrinsicID()) {		switch (FPI.getIntrinsicID()) {
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines	void Verifier::visitConstrainedFPIntrinsic(ConstrainedFPIntrinsic &FPI) {
}		}

// If a non-metadata argument is passed in a metadata slot then the		// If a non-metadata argument is passed in a metadata slot then the
// error will be caught earlier when the incorrect argument doesn't		// error will be caught earlier when the incorrect argument doesn't
// match the specification in the intrinsic call table. Thus, no		// match the specification in the intrinsic call table. Thus, no
// argument type check is needed here.		// argument type check is needed here.

if (HasExceptionMD) {		if (HasExceptionMD) {
Assert(FPI.getExceptionBehavior() != ConstrainedFPIntrinsic::ebInvalid,		Assert(FPI.getExceptionBehavior() != ExceptionBehavior::ebInvalid,
"invalid exception behavior argument", &FPI);		"invalid exception behavior argument", &FPI);
}		}
if (HasRoundingMD) {		if (HasRoundingMD) {
Assert(FPI.getRoundingMode() != ConstrainedFPIntrinsic::rmInvalid,		Assert(FPI.getRoundingMode() != RoundingMode::rmInvalid,
"invalid rounding mode argument", &FPI);		"invalid rounding mode argument", &FPI);
}		}
}		}

void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) {		void Verifier::visitDbgIntrinsic(StringRef Kind, DbgVariableIntrinsic &DII) {
auto *MD = cast<MetadataAsValue>(DII.getArgOperand(0))->getMetadata();		auto *MD = cast<MetadataAsValue>(DII.getArgOperand(0))->getMetadata();
AssertDI(isa<ValueAsMetadata>(MD) \|\|		AssertDI(isa<ValueAsMetadata>(MD) \|\|
(isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands()),		(isa<MDNode>(MD) && !cast<MDNode>(MD)->getNumOperands()),
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	struct VerifierLegacyPass : public FunctionPass {
bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
V = llvm::make_unique<Verifier>(		V = llvm::make_unique<Verifier>(
&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);		&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);
return false;		return false;
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (!V->verify(F) && FatalErrors) {		if (!V->verify(F) && FatalErrors) {
errs() << "in function " << F.getName() << '\n';		errs() << "in function " << F.getName() << '\n';
report_fatal_error("Broken function found, compilation aborted!");		report_fatal_error("Broken function found, compilation aborted!");
}		}
return false;		return false;
}		}

bool doFinalization(Module &M) override {		bool doFinalization(Module &M) override {
bool HasErrors = false;		bool HasErrors = false;
for (Function &F : M)		for (Function &F : M)
▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineAddSub.cpp

Show All 18 Lines
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
		#include "llvm/IR/PredicatedInst.h"
		#include "llvm/IR/VPBuilder.h"
		#include "llvm/IR/MatcherCast.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/AlignOf.h"		#include "llvm/Support/AlignOf.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include <cassert>		#include <cassert>
#include <utility>		#include <utility>

▲ Show 20 Lines • Show All 1,777 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitSub(BinaryOperator &I) {
if (!I.hasNoUnsignedWrap() && willNotOverflowUnsignedSub(Op0, Op1, I)) {		if (!I.hasNoUnsignedWrap() && willNotOverflowUnsignedSub(Op0, Op1, I)) {
Changed = true;		Changed = true;
I.setHasNoUnsignedWrap(true);		I.setHasNoUnsignedWrap(true);
}		}

return Changed ? &I : nullptr;		return Changed ? &I : nullptr;
}		}

		Instruction *InstCombiner::visitPredicatedFSub(PredicatedBinaryOperator& I) {
		auto * Inst = cast<Instruction>(&I);
		PredicatedContext PC(&I);
		if (Value *V = SimplifyPredicatedFSubInst(I.getOperand(0), I.getOperand(1),
		I.getFastMathFlags(),
		SQ.getWithInstruction(Inst), PC))
		return replaceInstUsesWith(*Inst, V);

		return visitFSubGeneric<Instruction, PredicatedContext>(*Inst);
		}

Instruction *InstCombiner::visitFSub(BinaryOperator &I) {		Instruction *InstCombiner::visitFSub(BinaryOperator &I) {
if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),		if (Value *V = SimplifyFSubInst(I.getOperand(0), I.getOperand(1),
I.getFastMathFlags(),		I.getFastMathFlags(),
SQ.getWithInstruction(&I)))		SQ.getWithInstruction(&I)))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (Instruction *X = foldVectorBinop(I))		if (Instruction *X = foldVectorBinop(I))
return X;		return X;

		return visitFSubGeneric<BinaryOperator, EmptyContext>(I);
		}

		template<typename BinaryOpTy, typename MatchContextType>
		Instruction *InstCombiner::visitFSubGeneric(BinaryOpTy &I) {
		MatchContextType MC(cast<Value>(&I));
		MatchContextBuilder<MatchContextType> MCBuilder(MC);

// Subtraction from -0.0 is the canonical form of fneg.		// Subtraction from -0.0 is the canonical form of fneg.
// fsub nsz 0, X ==> fsub nsz -0.0, X		// fsub nsz 0, X ==> fsub nsz -0.0, X
Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);		Value Op0 = I.getOperand(0), Op1 = I.getOperand(1);
if (I.hasNoSignedZeros() && match(Op0, m_PosZeroFP()))		if (I.hasNoSignedZeros() && MC.try_match(Op0, m_PosZeroFP()))
return BinaryOperator::CreateFNegFMF(Op1, &I);		return MCBuilder.CreateFNegFMF(Op1, &I);

Value X, Y;		Value X, Y;
Constant *C;		Constant *C;

// Fold negation into constant operand. This is limited with one-use because		// Fold negation into constant operand. This is limited with one-use because
// fneg is assumed better for analysis and cheaper in codegen than fmul/fdiv.		// fneg is assumed better for analysis and cheaper in codegen than fmul/fdiv.
// -(X * C) --> X * (-C)		// -(X * C) --> X * (-C)
if (match(&I, m_FNeg(m_OneUse(m_FMul(m_Value(X), m_Constant(C))))))		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FMul(m_Value(X), m_Constant(C))))))
return BinaryOperator::CreateFMulFMF(X, ConstantExpr::getFNeg(C), &I);		return MCBuilder.CreateFMulFMF(X, ConstantExpr::getFNeg(C), &I);
// -(X / C) --> X / (-C)		// -(X / C) --> X / (-C)
if (match(&I, m_FNeg(m_OneUse(m_FDiv(m_Value(X), m_Constant(C))))))		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FDiv(m_Value(X), m_Constant(C))))))
return BinaryOperator::CreateFDivFMF(X, ConstantExpr::getFNeg(C), &I);		return MCBuilder.CreateFDivFMF(X, ConstantExpr::getFNeg(C), &I);
// -(C / X) --> (-C) / X		// -(C / X) --> (-C) / X
if (match(&I, m_FNeg(m_OneUse(m_FDiv(m_Constant(C), m_Value(X))))))		if (MC.try_match(&I, m_FNeg(m_OneUse(m_FDiv(m_Constant(C), m_Value(X))))))
return BinaryOperator::CreateFDivFMF(ConstantExpr::getFNeg(C), X, &I);		return MCBuilder.CreateFDivFMF(ConstantExpr::getFNeg(C), X, &I);

// If Op0 is not -0.0 or we can ignore -0.0: Z - (X - Y) --> Z + (Y - X)		// If Op0 is not -0.0 or we can ignore -0.0: Z - (X - Y) --> Z + (Y - X)
// Canonicalize to fadd to make analysis easier.		// Canonicalize to fadd to make analysis easier.
// This can also help codegen because fadd is commutative.		// This can also help codegen because fadd is commutative.
// Note that if this fsub was really an fneg, the fadd with -0.0 will get		// Note that if this fsub was really an fneg, the fadd with -0.0 will get
// killed later. We still limit that particular transform with 'hasOneUse'		// killed later. We still limit that particular transform with 'hasOneUse'
// because an fneg is assumed better/cheaper than a generic fsub.		// because an fneg is assumed better/cheaper than a generic fsub.
if (I.hasNoSignedZeros() \|\| CannotBeNegativeZero(Op0, SQ.TLI)) {		if (I.hasNoSignedZeros() \|\| CannotBeNegativeZero(Op0, SQ.TLI)) {
if (match(Op1, m_OneUse(m_FSub(m_Value(X), m_Value(Y))))) {		if (MC.try_match(Op1, m_OneUse(m_FSub(m_Value(X), m_Value(Y))))) {
Value *NewSub = Builder.CreateFSubFMF(Y, X, &I);		Value *NewSub = MCBuilder.CreateFSubFMF(Builder, Y, X, &I);
return BinaryOperator::CreateFAddFMF(Op0, NewSub, &I);		return MCBuilder.CreateFAddFMF(Op0, NewSub, &I);
}		}
}		}

		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I))
if (isa<Constant>(Op0))		if (isa<Constant>(Op0))
if (SelectInst *SI = dyn_cast<SelectInst>(Op1))		if (SelectInst *SI = dyn_cast<SelectInst>(Op1))
if (Instruction *NV = FoldOpIntoSelect(I, SI))		if (Instruction NV = FoldOpIntoSelect(PlainBinOp, SI))
return NV;		return NV;

// X - C --> X + (-C)		// X - C --> X + (-C)
// But don't transform constant expressions because there's an inverse fold		// But don't transform constant expressions because there's an inverse fold
// for X + (-Y) --> X - Y.		// for X + (-Y) --> X - Y.
if (match(Op1, m_Constant(C)) && !isa<ConstantExpr>(Op1))		if (MC.try_match(Op1, m_Constant(C)) && !isa<ConstantExpr>(Op1))
return BinaryOperator::CreateFAddFMF(Op0, ConstantExpr::getFNeg(C), &I);		return MCBuilder.CreateFAddFMF(Op0, ConstantExpr::getFNeg(C), &I);

// X - (-Y) --> X + Y		// X - (-Y) --> X + Y
if (match(Op1, m_FNeg(m_Value(Y))))		if (MC.try_match(Op1, m_FNeg(m_Value(Y))))
return BinaryOperator::CreateFAddFMF(Op0, Y, &I);		return MCBuilder.CreateFAddFMF(Op0, Y, &I);

// Similar to above, but look through a cast of the negated value:		// Similar to above, but look through a cast of the negated value:
// X - (fptrunc(-Y)) --> X + fptrunc(Y)		// X - (fptrunc(-Y)) --> X + fptrunc(Y)
Type *Ty = I.getType();		Type *Ty = I.getType();
if (match(Op1, m_OneUse(m_FPTrunc(m_FNeg(m_Value(Y))))))		if (MC.try_match(Op1, m_OneUse(m_FPTrunc(m_FNeg(m_Value(Y))))))
return BinaryOperator::CreateFAddFMF(Op0, Builder.CreateFPTrunc(Y, Ty), &I);		return MCBuilder.CreateFAddFMF(Op0, MCBuilder.CreateFPTrunc(Builder, Y, Ty), &I);

// X - (fpext(-Y)) --> X + fpext(Y)		// X - (fpext(-Y)) --> X + fpext(Y)
if (match(Op1, m_OneUse(m_FPExt(m_FNeg(m_Value(Y))))))		if (MC.try_match(Op1, m_OneUse(m_FPExt(m_FNeg(m_Value(Y))))))
return BinaryOperator::CreateFAddFMF(Op0, Builder.CreateFPExt(Y, Ty), &I);		return MCBuilder.CreateFAddFMF(Op0, MCBuilder.CreateFPExt(Builder, Y, Ty), &I);

// Handle special cases for FSub with selects feeding the operation		// Handle special cases for FSub with selects feeding the operation
if (Value *V = SimplifySelectsFeedingBinaryOp(I, Op0, Op1))		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I))
		if (Value V = SimplifySelectsFeedingBinaryOp(PlainBinOp, Op0, Op1))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(I, V);

if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {		if (I.hasAllowReassoc() && I.hasNoSignedZeros()) {
// (Y - X) - Y --> -X		// (Y - X) - Y --> -X
if (match(Op0, m_FSub(m_Specific(Op1), m_Value(X))))		if (MC.try_match(Op0, m_FSub(m_Specific(Op1), m_Value(X))))
return BinaryOperator::CreateFNegFMF(X, &I);		return MCBuilder.CreateFNegFMF(X, &I);

// Y - (X + Y) --> -X		// Y - (X + Y) --> -X
// Y - (Y + X) --> -X		// Y - (Y + X) --> -X
if (match(Op1, m_c_FAdd(m_Specific(Op0), m_Value(X))))		if (MC.try_match(Op1, m_c_FAdd(m_Specific(Op0), m_Value(X))))
return BinaryOperator::CreateFNegFMF(X, &I);		return MCBuilder.CreateFNegFMF(X, &I);

// (X * C) - X --> X * (C - 1.0)		// (X * C) - X --> X * (C - 1.0)
if (match(Op0, m_FMul(m_Specific(Op1), m_Constant(C)))) {		if (MC.try_match(Op0, m_FMul(m_Specific(Op1), m_Constant(C)))) {
Constant *CSubOne = ConstantExpr::getFSub(C, ConstantFP::get(Ty, 1.0));		Constant *CSubOne = ConstantExpr::getFSub(C, ConstantFP::get(Ty, 1.0));
return BinaryOperator::CreateFMulFMF(Op1, CSubOne, &I);		return MCBuilder.CreateFMulFMF(Op1, CSubOne, &I);
}		}
// X - (X * C) --> X * (1.0 - C)		// X - (X * C) --> X * (1.0 - C)
if (match(Op1, m_FMul(m_Specific(Op0), m_Constant(C)))) {		if (MC.try_match(Op1, m_FMul(m_Specific(Op0), m_Constant(C)))) {
Constant *OneSubC = ConstantExpr::getFSub(ConstantFP::get(Ty, 1.0), C);		Constant *OneSubC = ConstantExpr::getFSub(ConstantFP::get(Ty, 1.0), C);
return BinaryOperator::CreateFMulFMF(Op0, OneSubC, &I);		return MCBuilder.CreateFMulFMF(Op0, OneSubC, &I);
}		}

if (Instruction *F = factorizeFAddFSub(I, Builder))		if (auto * PlainBinOp = dyn_cast<BinaryOperator>(&I)) {
		if (Instruction F = factorizeFAddFSub(PlainBinOp, Builder))
return F;		return F;

// TODO: This performs reassociative folds for FP ops. Some fraction of the		// TODO: This performs reassociative folds for FP ops. Some fraction of the
// functionality has been subsumed by simple pattern matching here and in		// functionality has been subsumed by simple pattern matching here and in
// InstSimplify. We should let a dedicated reassociation pass handle more		// InstSimplify. We should let a dedicated reassociation pass handle more
// complex pattern matching and remove this from InstCombine.		// complex pattern matching and remove this from InstCombine.
if (Value *V = FAddCombine(Builder).simplify(&I))		if (Value *V = FAddCombine(Builder).simplify(PlainBinOp))
return replaceInstUsesWith(I, V);		return replaceInstUsesWith(*PlainBinOp, V);
		}
}		}

return nullptr;		return nullptr;
}		}

lib/Transforms/InstCombine/InstCombineCalls.cpp

Show All 31 Lines
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/GlobalVariable.h"		#include "llvm/IR/GlobalVariable.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Statepoint.h"		#include "llvm/IR/Statepoint.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
▲ Show 20 Lines • Show All 1,841 Lines • ▼ Show 20 Lines	Instruction *InstCombiner::visitCallInst(CallInst &CI) {

// If the caller function is nounwind, mark the call as nounwind, even if the		// If the caller function is nounwind, mark the call as nounwind, even if the
// callee isn't.		// callee isn't.
if (CI.getFunction()->doesNotThrow() && !CI.doesNotThrow()) {		if (CI.getFunction()->doesNotThrow() && !CI.doesNotThrow()) {
CI.setDoesNotThrow();		CI.setDoesNotThrow();
return &CI;		return &CI;
}		}

		// Predicated instruction patterns
		auto * VPInst = dyn_cast<VPIntrinsic>(&CI);
		if (VPInst) {
		auto * PredInst = cast<PredicatedInstruction>(VPInst);
		auto Result = visitPredicatedInstruction(PredInst);
		if (Result) return Result;
		}

IntrinsicInst *II = dyn_cast<IntrinsicInst>(&CI);		IntrinsicInst *II = dyn_cast<IntrinsicInst>(&CI);
if (!II) return visitCallBase(CI);		if (!II) return visitCallBase(CI);

// Intrinsics cannot occur in an invoke or a callbr, so handle them here		// Intrinsics cannot occur in an invoke or a callbr, so handle them here
// instead of in visitCallBase.		// instead of in visitCallBase.
if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {		if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {
bool Changed = false;		bool Changed = false;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	if (auto *MI = dyn_cast<AnyMemIntrinsic>(II)) {
} else if (auto *MSI = dyn_cast<AnyMemSetInst>(MI)) {		} else if (auto *MSI = dyn_cast<AnyMemSetInst>(MI)) {
if (Instruction *I = SimplifyAnyMemSet(MSI))		if (Instruction *I = SimplifyAnyMemSet(MSI))
return I;		return I;
}		}

if (Changed) return II;		if (Changed) return II;
}		}

// For vector result intrinsics, use the generic demanded vector support.		// For vector result intrinsics, use the generic demanded vector support to
		// simplify any operands before moving on to the per-intrinsic rules.
if (II->getType()->isVectorTy()) {		if (II->getType()->isVectorTy()) {
auto VWidth = II->getType()->getVectorNumElements();		auto VWidth = II->getType()->getVectorNumElements();
APInt UndefElts(VWidth, 0);		APInt UndefElts(VWidth, 0);
APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));		APInt AllOnesEltMask(APInt::getAllOnesValue(VWidth));
if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {		if (Value *V = SimplifyDemandedVectorElts(II, AllOnesEltMask, UndefElts)) {
if (V != II)		if (V != II)
return replaceInstUsesWith(*II, V);		return replaceInstUsesWith(*II, V);
return II;		return II;
▲ Show 20 Lines • Show All 2,862 Lines • Show Last 20 Lines

lib/Transforms/InstCombine/InstCombineInternal.h

Show All 24 Lines
#include "llvm/IR/Constant.h"		#include "llvm/IR/Constant.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DerivedTypes.h"		#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstVisitor.h"		#include "llvm/IR/InstVisitor.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/PredicatedInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
▲ Show 20 Lines • Show All 305 Lines • ▼ Show 20 Lines	public:
// null - No change was made		// null - No change was made
// I - Change was made, I is still valid, I may be dead though		// I - Change was made, I is still valid, I may be dead though
// otherwise - Change was made, replace I with returned instruction		// otherwise - Change was made, replace I with returned instruction
//		//
Instruction *visitAdd(BinaryOperator &I);		Instruction *visitAdd(BinaryOperator &I);
Instruction *visitFAdd(BinaryOperator &I);		Instruction *visitFAdd(BinaryOperator &I);
Value OptimizePointerDifference(Value LHS, Value RHS, Type Ty);		Value OptimizePointerDifference(Value LHS, Value RHS, Type Ty);
Instruction *visitSub(BinaryOperator &I);		Instruction *visitSub(BinaryOperator &I);
		template<typename BinaryOpTy, typename MatcherType> Instruction *visitFSubGeneric(BinaryOpTy &I);
		Instruction *visitPredicatedFSub(PredicatedBinaryOperator &I);
Instruction *visitFSub(BinaryOperator &I);		Instruction *visitFSub(BinaryOperator &I);
Instruction *visitMul(BinaryOperator &I);		Instruction *visitMul(BinaryOperator &I);
Instruction *visitFMul(BinaryOperator &I);		Instruction *visitFMul(BinaryOperator &I);
Instruction *visitURem(BinaryOperator &I);		Instruction *visitURem(BinaryOperator &I);
Instruction *visitSRem(BinaryOperator &I);		Instruction *visitSRem(BinaryOperator &I);
Instruction *visitFRem(BinaryOperator &I);		Instruction *visitFRem(BinaryOperator &I);
bool simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I);		bool simplifyDivRemOfSelectWithZeroOp(BinaryOperator &I);
Instruction *commonRemTransforms(BinaryOperator &I);		Instruction *commonRemTransforms(BinaryOperator &I);
▲ Show 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	public:
Instruction *visitInsertElementInst(InsertElementInst &IE);		Instruction *visitInsertElementInst(InsertElementInst &IE);
Instruction *visitExtractElementInst(ExtractElementInst &EI);		Instruction *visitExtractElementInst(ExtractElementInst &EI);
Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);		Instruction *visitShuffleVectorInst(ShuffleVectorInst &SVI);
Instruction *visitExtractValueInst(ExtractValueInst &EV);		Instruction *visitExtractValueInst(ExtractValueInst &EV);
Instruction *visitLandingPadInst(LandingPadInst &LI);		Instruction *visitLandingPadInst(LandingPadInst &LI);
Instruction *visitVAStartInst(VAStartInst &I);		Instruction *visitVAStartInst(VAStartInst &I);
Instruction *visitVACopyInst(VACopyInst &I);		Instruction *visitVACopyInst(VACopyInst &I);

		// Entry point to VPIntrinsic
		Instruction visitPredicatedInstruction(PredicatedInstruction PI) {
		switch (PI->getOpcode()) {
		default:
		return nullptr;
		case Instruction::FSub:
		return visitPredicatedFSub(cast<PredicatedBinaryOperator>(*PI));
		}
		}

/// Specify what to return for unhandled instructions.		/// Specify what to return for unhandled instructions.
Instruction *visitInstruction(Instruction &I) { return nullptr; }		Instruction *visitInstruction(Instruction &I) { return nullptr; }

/// True when DB dominates all uses of DI except UI.		/// True when DB dominates all uses of DI except UI.
/// UI must be in the same block as DI.		/// UI must be in the same block as DI.
/// The routine checks that the DI parent and DB are different.		/// The routine checks that the DI parent and DB are different.
bool dominatesAllUses(const Instruction DI, const Instruction UI,		bool dominatesAllUses(const Instruction DI, const Instruction UI,
const BasicBlock *DB) const;		const BasicBlock *DB) const;
▲ Show 20 Lines • Show All 524 Lines • Show Last 20 Lines

lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 777 Lines • ▼ Show 20 Lines	if (Attr.isStringAttribute()) {
case Attribute::Convergent:		case Attribute::Convergent:
case Attribute::Dereferenceable:		case Attribute::Dereferenceable:
case Attribute::DereferenceableOrNull:		case Attribute::DereferenceableOrNull:
case Attribute::InAlloca:		case Attribute::InAlloca:
case Attribute::InReg:		case Attribute::InReg:
case Attribute::InaccessibleMemOnly:		case Attribute::InaccessibleMemOnly:
case Attribute::InaccessibleMemOrArgMemOnly:		case Attribute::InaccessibleMemOrArgMemOnly:
case Attribute::JumpTable:		case Attribute::JumpTable:
		case Attribute::Mask:
case Attribute::Naked:		case Attribute::Naked:
case Attribute::Nest:		case Attribute::Nest:
case Attribute::NoAlias:		case Attribute::NoAlias:
case Attribute::NoBuiltin:		case Attribute::NoBuiltin:
case Attribute::NoCapture:		case Attribute::NoCapture:
case Attribute::NoReturn:		case Attribute::NoReturn:
case Attribute::None:		case Attribute::None:
case Attribute::NonNull:		case Attribute::NonNull:
		case Attribute::Passthru:
case Attribute::ReadNone:		case Attribute::ReadNone:
case Attribute::ReadOnly:		case Attribute::ReadOnly:
case Attribute::Returned:		case Attribute::Returned:
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
case Attribute::SExt:		case Attribute::SExt:
case Attribute::Speculatable:		case Attribute::Speculatable:
case Attribute::StackAlignment:		case Attribute::StackAlignment:
case Attribute::StructRet:		case Attribute::StructRet:
case Attribute::SwiftError:		case Attribute::SwiftError:
case Attribute::SwiftSelf:		case Attribute::SwiftSelf:
		case Attribute::VectorLength:
case Attribute::WriteOnly:		case Attribute::WriteOnly:
case Attribute::ZExt:		case Attribute::ZExt:
case Attribute::ImmArg:		case Attribute::ImmArg:
case Attribute::EndAttrKinds:		case Attribute::EndAttrKinds:
continue;		continue;
// Those attributes should be safe to propagate to the extracted function.		// Those attributes should be safe to propagate to the extracted function.
case Attribute::AlwaysInline:		case Attribute::AlwaysInline:
case Attribute::Cold:		case Attribute::Cold:
▲ Show 20 Lines • Show All 718 Lines • Show Last 20 Lines

test/Bitcode/attributes.ll

	Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	}			}

	; CHECK: define void @f59() #35			; CHECK: define void @f59() #35
	define void @f59() shadowcallstack			define void @f59() shadowcallstack
	{			{
	ret void			ret void
	}			}

				; CHECK: define <8 x double> @f60(<8 x double> passthru, <8 x i1> mask, i32 vlen) {
				define <8 x double> @f60(<8 x double> passthru, <8 x i1> mask, i32 vlen) {
				ret <8 x double> undef
				}

	; CHECK: attributes #0 = { noreturn }			; CHECK: attributes #0 = { noreturn }
	; CHECK: attributes #1 = { nounwind }			; CHECK: attributes #1 = { nounwind }
	; CHECK: attributes #2 = { readnone }			; CHECK: attributes #2 = { readnone }
	; CHECK: attributes #3 = { readonly }			; CHECK: attributes #3 = { readonly }
	; CHECK: attributes #4 = { noinline }			; CHECK: attributes #4 = { noinline }
	; CHECK: attributes #5 = { alwaysinline }			; CHECK: attributes #5 = { alwaysinline }
	; CHECK: attributes #6 = { optsize }			; CHECK: attributes #6 = { optsize }
	; CHECK: attributes #7 = { ssp }			; CHECK: attributes #7 = { ssp }
	Show All 29 Lines

test/Transforms/InstCombine/vp-fsub.ll

This file was added.

				; RUN: opt < %s -instcombine -S \| FileCheck %s

				; PR4374

				define <4 x float> @test1_vp(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) {
				; CHECK-LABEL: @test1_vp(
				;
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L)
				%t2 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, <4 x float> %t1, <4 x i1> %M, i32 %L)
				ret <4 x float> %t2
				}

				; Can't do anything with the test above because -0.0 - 0.0 = -0.0, but if we have nsz:
				; -(X - Y) --> Y - X

				; TODO predicated FAdd folding
				define <4 x float> @neg_sub_nsz_vp(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L) {
				; CH***-LABEL: @neg_sub_nsz_vp(
				;
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L)
				%t2 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> <float -0.0, float -0.0, float -0.0, float -0.0>, <4 x float> %t1, <4 x i1> %M, i32 %L)
				ret <4 x float> %t2
				}

				; With nsz: Z - (X - Y) --> Z + (Y - X)

				define <4 x float> @sub_sub_nsz_vp(<4 x float> %x, <4 x float> %y, <4 x float> %z, <4 x i1> %M, i32 %L) {
				; CHECK-LABEL: @sub_sub_nsz_vp(
				; CHECK-NEXT: %1 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %y, <4 x float> %x, <4 x i1> %M, i32 %L)
				; CHECK-NEXT: %t2 = call nsz <4 x float> @llvm.vp.fadd.v4f32(<4 x float> %z, <4 x float> %1, <4 x i1> %M, i32 %L)
				; CHECK-NEXT: ret <4 x float> %t2
				%t1 = call <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %x, <4 x float> %y, <4 x i1> %M, i32 %L)
				%t2 = call nsz <4 x float> @llvm.vp.fsub.v4f32(<4 x float> %z, <4 x float> %t1, <4 x i1> %M, i32 %L)
				ret <4 x float> %t2
				}



				; Function Attrs: nounwind readnone
				declare <4 x float> @llvm.vp.fadd.v4f32(<4 x float>, <4 x float>, <4 x i1> mask, i32 vlen) #0

				; Function Attrs: nounwind readnone
				declare <4 x float> @llvm.vp.fsub.v4f32(<4 x float>, <4 x float>, <4 x i1> mask, i32 vlen) #0

test/Transforms/InstSimplify/vp-fsub.ll

This file was added.

				; RUN: opt < %s -instsimplify -S \| FileCheck %s

				define <8 x double> @fsub_fadd_fold_vp_xy(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len) {
				; CHECK-LABEL: fsub_fadd_fold_vp_xy
				; CHECK-NEXT: ret <8 x double> %x
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len)
				%res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %m, i32 %len)
				ret <8 x double> %x
				}

				define <8 x double> @fsub_fadd_fold_vp_yx(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len) {
				; CHECK-LABEL: fsub_fadd_fold_vp_yx
				; CHECK-NEXT: ret <8 x double> %x
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, <8 x i1> %m, i32 %len)
				%res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %m, i32 %len)
				ret <8 x double> %x
				}

				define <8 x double> @fsub_fadd_fold_vp_yx_olen(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len, i32 %otherLen) {
				; CHECK-LABEL: fsub_fadd_fold_vp_yx_olen
				; CHECK-NEXT: %tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, <8 x i1> %m, i32 %otherLen)
				; CHECK-NEXT: %res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %m, i32 %len)
				; CHECK-NEXT: ret <8 x double> %res
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, <8 x i1> %m, i32 %otherLen)
				%res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %m, i32 %len)
				ret <8 x double> %res
				}

				define <8 x double> @fsub_fadd_fold_vp_yx_omask(<8 x double> %x, <8 x double> %y, <8 x i1> %m, i32 %len, <8 x i1> %othermask) {
				; CHECK-LABEL: fsub_fadd_fold_vp_yx_omask
				; CHECK-NEXT: %tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, <8 x i1> %m, i32 %len)
				; CHECK-NEXT: %res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %othermask, i32 %len)
				; CHECK-NEXT: ret <8 x double> %res
				%tmp = call reassoc nsz <8 x double> @llvm.vp.fadd.v8f64(<8 x double> %y, <8 x double> %x, <8 x i1> %m, i32 %len)
				%res = call reassoc nsz <8 x double> @llvm.vp.fsub.v8f64(<8 x double> %tmp, <8 x double> %y, <8 x i1> %othermask, i32 %len)
				ret <8 x double> %res
				}

				; Function Attrs: nounwind readnone
				declare <8 x double> @llvm.vp.fadd.v8f64(<8 x double>, <8 x double>, <8 x i1> mask, i32 vlen) #0

				; Function Attrs: nounwind readnone
				declare <8 x double> @llvm.vp.fsub.v8f64(<8 x double>, <8 x double>, <8 x i1> mask, i32 vlen) #0

test/Verifier/evl_attribs.ll

This file was added.

				; RUN: not llvm-as %s -o /dev/null 2>&1 \| FileCheck %s

				declare void @a(<16 x i1> mask %a, <16 x i1> mask %b)
				; CHECK: Cannot have multiple 'mask' parameters!

				declare void @b(<16 x i1> mask %a, i32 vlen %x, i32 vlen %y)
				; CHECK: Cannot have multiple 'vlen' parameters!

				declare <16 x double> @c(<16 x double> passthru %a)
				; CHECK: Cannot have 'passthru' parameter without 'mask' parameter!

				declare <16 x double> @d(<16 x double> passthru %a, <16 x i1> mask %M, <16 x double> passthru %b)
				; CHECK: Cannot have multiple 'passthru' parameters!

utils/TableGen/CodeGenIntrinsics.h

Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines	struct CodeGenIntrinsic {
bool isSpeculatable;		bool isSpeculatable;

enum ArgAttribute {		enum ArgAttribute {
NoCapture,		NoCapture,
Returned,		Returned,
ReadOnly,		ReadOnly,
WriteOnly,		WriteOnly,
ReadNone,		ReadNone,
ImmArg		ImmArg,
		Mask,
		VectorLength,
		Passthru
};		};

std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;		std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;

bool hasProperty(enum SDNP Prop) const {		bool hasProperty(enum SDNP Prop) const {
return Properties & (1 << Prop);		return Properties & (1 << Prop);
}		}

Show All 27 Lines

utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 606 Lines • ▼ Show 20 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|		!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else {		} else {
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));
}
if (MVT(VT).isOverloaded()) {		if (MVT(VT).isOverloaded()) {
OverloadedVTs.push_back(VT);		OverloadedVTs.push_back(VT);
isOverloaded = true;		isOverloaded = true;
}		}
		}

// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid)		if (VT == MVT::isVoid)
PrintFatalError(DefLoc, "Intrinsic '" + DefName +		PrintFatalError(DefLoc, "Intrinsic '" + DefName +
" has void in result type list!");		" has void in result type list!");

IS.RetVTs.push_back(VT);		IS.RetVTs.push_back(VT);
IS.RetTypeDefs.push_back(TyEl);		IS.RetTypeDefs.push_back(TyEl);
Show All 18 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
// It only makes sense to use the extended and truncated vector element		// It only makes sense to use the extended and truncated vector element
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType") &&		!TyEl->isSubClassOf("LLVMTruncatedType") &&
!TyEl->isSubClassOf("LLVMScalarOrSameVectorWidth")) \|\|		!TyEl->isSubClassOf("LLVMScalarOrSameVectorWidth")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else		} else {
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));

if (MVT(VT).isOverloaded()) {		if (MVT(VT).isOverloaded()) {
OverloadedVTs.push_back(VT);		OverloadedVTs.push_back(VT);
isOverloaded = true;		isOverloaded = true;
}		}
		}


// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)		if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)
PrintFatalError(DefLoc, "Intrinsic '" + DefName +		PrintFatalError(DefLoc, "Intrinsic '" + DefName +
" has void in result type list!");		" has void in result type list!");

IS.ParamVTs.push_back(VT);		IS.ParamVTs.push_back(VT);
IS.ParamTypeDefs.push_back(TyEl);		IS.ParamTypeDefs.push_back(TyEl);
Show All 36 Lines	for (unsigned i = 0, e = PropList->size(); i != e; ++i) {
else if (Property->getName() == "IntrHasSideEffects")		else if (Property->getName() == "IntrHasSideEffects")
hasSideEffects = true;		hasSideEffects = true;
else if (Property->isSubClassOf("NoCapture")) {		else if (Property->isSubClassOf("NoCapture")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));		ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));
} else if (Property->isSubClassOf("Returned")) {		} else if (Property->isSubClassOf("Returned")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));		ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));
		} else if (Property->isSubClassOf("VectorLength")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, VectorLength));
		} else if (Property->isSubClassOf("Mask")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Mask));
		} else if (Property->isSubClassOf("Passthru")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Passthru));
} else if (Property->isSubClassOf("ReadOnly")) {		} else if (Property->isSubClassOf("ReadOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));
} else if (Property->isSubClassOf("WriteOnly")) {		} else if (Property->isSubClassOf("WriteOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));
} else if (Property->isSubClassOf("ReadNone")) {		} else if (Property->isSubClassOf("ReadNone")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
Show All 14 Lines

utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 587 Lines • ▼ Show 20 Lines	if (ae) {
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::Returned:		case CodeGenIntrinsic::Returned:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::Returned";		OS << "Attribute::Returned";
addComma = true;		addComma = true;
break;		break;
		case CodeGenIntrinsic::VectorLength:
		if (addComma)
		OS << ",";
		OS << "Attribute::VectorLength";
		addComma = true;
		break;
		case CodeGenIntrinsic::Mask:
		if (addComma)
		OS << ",";
		OS << "Attribute::Mask";
		addComma = true;
		break;
		case CodeGenIntrinsic::Passthru:
		if (addComma)
		OS << ",";
		OS << "Attribute::Passthru";
		addComma = true;
		break;
case CodeGenIntrinsic::ReadOnly:		case CodeGenIntrinsic::ReadOnly:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::ReadOnly";		OS << "Attribute::ReadOnly";
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::WriteOnly:		case CodeGenIntrinsic::WriteOnly:
if (addComma)		if (addComma)
▲ Show 20 Lines • Show All 265 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

RFC: Prototype & Roadmap for vector predication in LLVMChanges PlannedPublic

Details

Vector Predication Roadmap

Vector Predication intrinsics

Roadmap

References

Diff Detail

Event Timeline

Updates

Cross references

Updates

Updates

Updates

Updates

Planned

Updates

Observations

Next steps

Integer slice patches

Changes

Changes required going from passthru to select:

Changes required going from select to passthru:

Changes required going from passthru to select:

Changes required going from select to passthru:

TODO:

Revision Contents

Diff 195366

docs/LangRef.rst

docs/Proposals/VectorPredication.rst

include/llvm/Analysis/InstructionSimplify.h

include/llvm/Bitcode/LLVMBitCodes.h

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/Attributes.td

include/llvm/IR/InstrTypes.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

include/llvm/IR/MatcherCast.h

include/llvm/IR/PatternMatch.h

include/llvm/IR/PredicatedInst.h

include/llvm/IR/VPBuilder.h

include/llvm/Target/TargetSelectionDAG.td

lib/Analysis/InstructionSimplify.cpp

lib/AsmParser/LLLexer.cpp

lib/AsmParser/LLParser.cpp

lib/AsmParser/LLToken.h

lib/Bitcode/Reader/BitcodeReader.cpp

lib/Bitcode/Writer/BitcodeWriter.cpp

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/IR/Attributes.cpp

lib/IR/CMakeLists.txt

lib/IR/IntrinsicInst.cpp

lib/IR/PredicatedInst.cpp

lib/IR/VPBuilder.cpp

lib/IR/Verifier.cpp

lib/Transforms/InstCombine/InstCombineAddSub.cpp

lib/Transforms/InstCombine/InstCombineCalls.cpp

lib/Transforms/InstCombine/InstCombineInternal.h

lib/Transforms/Utils/CodeExtractor.cpp

test/Bitcode/attributes.ll

test/Transforms/InstCombine/vp-fsub.ll

test/Transforms/InstSimplify/vp-fsub.ll

test/Verifier/evl_attribs.ll

utils/TableGen/CodeGenIntrinsics.h

utils/TableGen/CodeGenTarget.cpp

utils/TableGen/IntrinsicEmitter.cpp

RFC: Prototype & Roadmap for vector predication in LLVM
Changes PlannedPublic