This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/llvm/
-
llvm/
-
Bitcode/
-
LLVMBitCodes.h
-
CodeGen/
-
ISDOpcodes.h
-
SelectionDAG.h
-
SelectionDAGNodes.h
-
IR/
-
Attributes.td
-
EVLBuilder.h
-
IntrinsicInst.h
4/4
Intrinsics.td
-
Target/
-
TargetSelectionDAG.td
-
lib/
-
AsmParser/
-
LLLexer.cpp
-
LLParser.cpp
-
LLToken.h
-
Bitcode/
-
Reader/
-
BitcodeReader.cpp
-
Writer/
-
BitcodeWriter.cpp
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
DAGCombiner.cpp
-
LegalizeIntegerTypes.cpp
-
LegalizeTypes.h
-
SelectionDAG.cpp
-
SelectionDAGBuilder.h
-
SelectionDAGBuilder.cpp
-
SelectionDAGDumper.cpp
-
SelectionDAGISel.cpp
-
IR/
-
Attributes.cpp
-
CMakeLists.txt
-
EVLBuilder.cpp
-
IntrinsicInst.cpp
-
Verifier.cpp
-
Transforms/Utils/
-
Utils/
-
CodeExtractor.cpp
-
test/
-
Bitcode/
-
attributes.ll
-
Verifier/
-
evl_attribs.ll
-
utils/TableGen/
-
TableGen/
-
CodeGenIntrinsics.h
-
CodeGenTarget.cpp
-
IntrinsicEmitter.cpp

Differential D53613

RFC: Explicit Vector Length Intrinsics and Attributes
Needs ReviewPublic

Authored by simoll on Oct 23 2018, 2:45 PM.

Download Raw Diff

Details

Reviewers

rengolin
jdoerfert

Summary

This is a proposal to add vector intrinsics and function attributes to LLVM IR to better support predicated vector code, including targets with a dynamic vector length (RISC-V V, NEC SX-Aurora).
The attributes are designed to simplify automatic vectorization and optimization of predicated data flow. Non-predicating SIMD architectures should benefit from these changes as well through a common legalization scheme (eg lowering of fdiv in predicated contexts).

This is a follow up on my tech talk at the 2018 LLVM DevMtg, "Stories from RV..." (https://llvm.org/devmtg/2018-10/talk-abstracts.html#talk22), and the subsequent discussions at the round table.

Rationale

LLVM IR does not support predicated execution as a first-order concept. Instead there is a growing body of intrinsics (llvm.masked.*) and workarounds (select for arithmetic, VectorABI for general function calls), which encode or at least emulate predication in their respective context. The discussions and patches for LLVM-SVE show that there is a need to accomodate architectures with a Dynamic Vector Length (RISC-V V extension, NEC SX-Aurora TSUBASA).

This RFC provides a coherent set of intrinsics and attributes that enable predication through bit masks and EVL in LLVM IR.

Proposed changes

Intrinsics

We propose to add a new set of intrinsics to the "llvm.evl.*" prefix. After the change, it will include the following operations:

all standard binary (Add, FAdd, Sub, FSub, Mul, FMul, UDiv, SDiv, FDiv, URem, SRem, FRem)
logical operators (Shl, LShr, AShr, And, Or, Xor)
experimental reduce (fadd, fmul, add, mul, and, or, xor, smax, smin, uman, umin, fmax, fmin)
ICmp, FCmp
Select
All of llvm.masked.* namespace (load, store, gather, scatter, expandload, compressstore)

All of the intrinsics in the llvm.evl namespace take in two predicating parameters: a mask of bit vector type (eg <8 x i1>) and a explicit vector length value (i32).

Attributes

We propose three new attributes for function parameters:
mask: this parameter encodes the predicate of this operation. Inputs on unmasked lanes must not affect enabled result lanes in any way.
vlen: this parameter encodes the explicit vector length (VL) of the instruction. The operation does not apply for lanes beyond this parameter. The result for lanes >= vlen is "undef".
passthru: lanes of this parameter are returned where the same lane in the mask is false. This only applies to lanes below the value of the <vlen> parameter (if there is one). This is to be used on general IR functions.

We show the semantics in the example below.
The attributes are intended for general use in IR functions, not just the EVL intrinsics.

An example

Let the predicated fdiv have the following signature:

llvm.evl.fdiv.v4f64(<4 x double> %a, <4 x double> %b, <4 x i1> mask %mask, i32 vlen %dynamic)

Consider this application of fdiv:

llvm.evl.fdiv.v4f64(<4 x double> <4.2, 6.0, 1.0, 1.0>, <4 x double> <0.0, 3.0, nan, 0>, <4 x i1> <0, 1, 1, 1>, 2)
== <undef, 2.0, undef, undef>

The first %mask bit is '0' and the operation will yield undef for the first lane.
The second %mask bit is '1' and so the result on the second lane is just 6.0 / 3.0.
The last two lanes are beyond the explicit vector length %vlen and so their results are undef regardless of passthru.

Lowering

We show possible lowering strategies for the following prototypical SIMD ISAs:

LLVM-SVE with predication and dynamic vector length (RISC-V V extension, NEC SX-Aurora)

For these targets, the intrinsics map over directly to the ISA.

Lowering for targets w/o dynamic vector length (AVX512, ARM SVE, ..)

ARM SVE does not feature a dynamic vector length register.
Hence, the vector length needs to be promoted to the bit mask predicate, shown here for a LLVM-SVE target:

Block before legalization:

..
foo (..., %mask, %dynamic_vl)
...

After legalization:

%vscale32 = call i32 @llvm.experimental.vector.vscale.32()
...
%stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32()
%vl_mask = icmp <scalable 4 x i1> %stepvector, %stepvector, %dynamic_vl
%new_mask = and <scalable 4 x i1> %mask, %vl_mask
foo (..., <scalable 4 x i1> %new_mask, i32 %vscale32)
...

Lowering for fixed-width SIMD w/o predication (SSE, NEON, AdvSimd, ..)

Scalarization and/or speculation on a full predicate.

Example 1: safe fdiv

int foo(double * A, double * B, int n) {

#pragma omp simd simdlen(8)
for (int i = 0; i < n; ++i) {
  double a = A[i];
  double r = a;
  if (a > 0.0) {
    r = 42.0 / a;
  }
  B[i] = r;
}

}

<8 x double> @llvm.evl.fdiv.v8f64(<8 x f64> %a, <8 x f64> %b, <8 x i1> mask %mask, i32 vlen %length)

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %0 = getelementptr inbounds double, double* %A, i64 %index
  %1 = bitcast double* %0 to <8 x double>*
  %wide.load = load <8 x double>, <8 x double>* %1, align 8, !tbaa !2
  %2 = fcmp ogt <8 x double> %wide.load, zeroinitializer

  ; variant that LV generates today:
  ; %3 = fdiv <8 x double> <double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01>, %wide.load
  ; %4 = select <8 x i1> %2, <8 x double> %3, <8 x double> %wide.load

  ; using EVL:
  %4 = call <8 x double> @llvm.evl.fdiv.v8f64(<8 x double> <double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01, double 4.200000e+01>, %wide.load, <8 x i1> %2, i32 8)

  %5 = getelementptr inbounds double, double* %B, i64 %index
  %6 = bitcast double* %5 to <8 x double>*
  store <8 x double> %4, <8 x double>* %6, align 8, !tbaa !2
  %index.next = add i64 %index, 8
  %7 = icmp eq i64 %index.next, %n.vec
  br i1 %7, label %middle.block, label %vector.body, !llvm.loop !6

Pros & Cons

Pros

The generality of the intrinsics simplifies the job of the vectorizer's widening phase (speaking for RV, should apply to LV/VPlan as well): Scalar instruction opcodes only need to be mapped over to their respective evl intrinsic name. The mask and vlen are passed to the annotated arguments.

Regarding the evl intrinsics (instead of extending the IR):

The new predication scheme is completely optional and does not interfere with LLVM's vector instructions at all.
Existing backends can use a generic lowering scheme from evl to "classic" vector instructions.
Likewise, lifting passes can convert classic vector instructions to the new intrinsics if deemed beneficial for backend implementation (NEC SX-Aurora, RISC-V V(?)..)

Marking out the mask and the vlen parameters with attributes has the following advantages:

Analyses and optimizations understand the flow of predicates from a quick glance at the functions' attributes, no further knowledge about the functions' internals is required.
Dynamic vlen and the vector mask may be treated specially in the target's CC (eg by passing dynamic vlen in a VL register, or the active mask in a dedicated register (AMDGPU(?))).
Legalization does not have to know the nature of the intrinsic to legalize dynamic vlen where it is not supported.

Cons

Intrinsic bloat.
Predicating architectures without dynamic vector length have to pass in a redundant vlen to exploit these intrinsics.
Some of LLVM's optimizations that need to understand the nature of the intrinsics` semantics, like InstCombine, need to be taught about evl intrinsics to be able to optimize them. This will require at least some engineering effort.

Diff Detail

Repository

rL LLVM

Build Status

Buildable 27192
Build 27191: arc lint + arc unit

Event Timeline

simoll created this revision.Oct 23 2018, 2:45 PM

simoll created this object with visibility "No One".

simoll retitled this revision from RFC: Predicated Vector Intrinsics to RFC: Dynamic Vector Length Intrinsics and Attributes.Oct 23 2018, 2:47 PM

simoll changed the visibility from "No One" to "Public (No Login Required)".

Herald added subscribers: llvm-commits, PkmX, dexonsmith and 8 others. · View Herald TranscriptOct 23 2018, 2:47 PM

This patch is just for reference: it implements the three attributes and (some) intrinsic declarations.

Limiting this to the LLVM-SVE round table for now. We can bring this to llvm-dev when there is agreement on the overall design.

Hi Simon,

thanks for contributing this proposal.

Do you think it is orthogonal for the cases, like in RISC-V V-extension, where we may not want to commit to a specific simdlen? I'm thinking of a vectorisation scheme similar to the one described here https://content.riscv.org/wp-content/uploads/2018/05/15.20-15.55-18.05.06.VEXT-bcn-v1.pdf (slides 22 to 41). (I understand it still needs some "variable-length vector" but there are proposals already in this area).

Regards

tschuett added a subscriber: tschuett.Oct 23 2018, 11:10 PM

Hi Roger,

good to see RISC-V people getting back on this! We couldn't get a hold of anybody working on the V extension at the DevMtg.

Yes, the changes in this RFC are compatible with a physical simdlen that is unknown at compile time.

Regarding the re-configurable MVL of the V extension If this scheme here were adopted, it would definitely work with the Fixed-MVL-Per-Function approach that @rkruppe presented.

Here is an idea (...at the risk of side-tracking this RFC): I believe it will still work if a more flexible approach to re-configuring MVL is taken: eg if you used DVL intrinsics for RISC-V V then the dynamic vector length for every call will be derived from some llvm.r5v.getmvl() intrinsic call. You could then pick one common MVL for all DVL intrinsics that derive their dynamicvl from the same call to this intrinsic. More appropriately, the intrinsic's name should then be more like llvm.r5v.configuremvl(). You could then pass down the MVL to callees the same way it is done here for the dynamic vector length (function argument plus attribute). So functions that inherit their MVL from the caller could transparently do so by deriving dynamicvl from that argument (instead of the intrinsic, which would re-roll the MVL die).

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

This basic approach of intrinsics with both a regular mask parameter and an integer parameter for the number of lanes to process matches what I've been doing for RISC-V and it works well for that purpose, especially for the strip-mined loops @rogfer01 highlighted. I think the same approach should generalize to other architectures and to vectorized code that does not follow the strip-mining style:

The dynamic vector length, if not needed, can be set to the constant -1 (= UINT_MAX)
The mask parameter, if not needed, can be set to the constant <i1 true, i1 true, ...>

Both of these settings are easy to recognize in legalization/ISel and in IR passes, so unpredicated dynamic-vl operations as well as predicated-but-full-length operations can be represented with (in principle) no loss of optimization power and only a little bit of boilerplate. That seems like a good trade off for not having multiple variants of every intrinsic.

I am a bit less sure about the new attributes. If it was just about the intrinsics, I'd argue for creating helper query functions like that extract the relevant arguments from a call or Function object, using knowledge of the intrinsic signatures. But on my third reading of the text I finally realized you want to apply them to non-intrinsincs as well. An example of how each of these would be used (e.g. by RV or an OpenMP implementation) would be useful. I can see the value of passing the dynamic vector length in a specific register, but at a glance, unmasked_return seems rarely applicable to user-defined functions (similarly to the returned parameter attribute, which is a bit niche).

Editorial note: I find the way "unmasked" is used here confusing. You seem to use it for "lanes where the mask bit is 0, which are disabled", but IME "unmasked" means operations with no predication at all and lanes with mask bit 0 are called "disabled" or "masked out" or something to that effect.

PS: Representing RISC-V's MVL/vector configuration as an SSA value returned by and passed to functions is the first thought everyone has, including me, but I've tried extensively and it just can't work. I don't want to be too curt about this, but I'd really prefer to not side-track this RFC with rehashing the reasons why it doesn't work. If you want, Simon, you can start a thread on llvm-dev or email me privately and we can chat about it, but let's keep this thread on-topic.

In D53613#1273846, @rkruppe wrote:

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

This basic approach of intrinsics with both a regular mask parameter and an integer parameter for the number of lanes to process matches what I've been doing for RISC-V and it works well for that purpose, especially for the strip-mined loops @rogfer01 highlighted. I think the same approach should generalize to other architectures and to vectorized code that does not follow the strip-mining style:

The dynamic vector length, if not needed, can be set to the constant -1 (= UINT_MAX)

The mask parameter, if not needed, can be set to the constant <i1 true, i1 true, ...>

Both of these settings are easy to recognize in legalization/ISel and in IR passes, so unpredicated dynamic-vl operations as well as predicated-but-full-length operations can be represented with (in principle) no loss of optimization power and only a little bit of boilerplate. That seems like a good trade off for not having multiple variants of every intrinsic.

Great, good to hear we are on the same page here.

I am a bit less sure about the new attributes. If it was just about the intrinsics, I'd argue for creating helper query functions like that extract the relevant arguments from a call or Function object, using knowledge of the intrinsic signatures. But on my third reading of the text I finally realized you want to apply them to non-intrinsincs as well. An example of how each of these would be used (e.g. by RV or an OpenMP implementation) would be useful. I can see the value of passing the dynamic vector length in a specific register, but at a glance, unmasked_return seems rarely applicable to user-defined functions (similarly to the returned parameter attribute, which is a bit niche).

Two reasons: first, we want to avoid this kind of hard-coded knowledge about intrinsics and second, the attributes allow you to coalesce vector registers. As a plus they simplify whole-function vectorization with dvl and predication beyond what's currently supported by OpenMP/VectorABI.

Example

Let's say you'd want to vectorize a loop like this for a predicating/dynamicvl architecture:

for (int i =0; i <n; ++i) {
  double x = B[i];
  double y = C[i];
  A[i] = x > 0 ? bar(x) : y;
}

And there were a user-provided (or RV-auto-vectorized) SIMD version of bar with the following signature:

def <scalable 1 x double> @bar_dvl_nxv1(<scalable 1 x double> %a, <scalable 1 x double> unmasked_ret %b, <scalable 1 x i1> mask %mask, i32 dynamicvl %vl) {..}

Crucially, the implementation of @bar may use llvm.dvl intrinsics (or other function calls) internally but there is no way of telling the default return value (for masked-out lanes) without inspecting the IR... and worse you might just be given a declaration of that function.

However, by inspecting just the attributes the loop vectorizer could simply vectorize the call to @bar like below (you'd still need a way to tell vector shapes as in OpenMPs linear,aligned,.. clauses and VectorABI).

for.body.rv:
    %cond = fcmp <scalable 1 x i1> %b, splat 0.0
    %x = call <scalable 1 x double> @llvm.dvl.load(..., %dvl)
    %y = call <scalable 1 x double> @llvm.dvl.load(..., %dvl)
    ...
    %result = call <scalable 1 x double> @bar_dvl_nxv1( %x, %y, %cond, %dvl)
    ...
    llvm.dvl.store(%Aptr, %result, ...)
    ...

The select is folded into the vectorized function call, which is not possible otherwise.
Moreover if RV auto-vectorizes @bar it will automatically annotate the vectorized functions with these attributes for you.

Vector register coalescing

If you have unmasked_ret knowledge about the data flow in you vector code, register allocation can exploit that to safe registers and avoid vector register spills.
Vector values with complementary masks can be coalesced into one register, relying on the fact that the masked-out part will be preserved through all arithmetic and function calls.
This already applies to the earlier example because there is no need to keep %y alive across the call site: instead we know that the return value of call @bar_dvl_nxv1 contains the parts of %y we care about.

Editorial note: I find the way "unmasked" is used here confusing. You seem to use it for "lanes where the mask bit is 0, which are disabled", but IME "unmasked" means operations with no predication at all and lanes with mask bit 0 are called "disabled" or "masked out" or something to that effect.

Sure. Do you have a specific suggestion? How about `maskedout_ret' instead?

PS: Representing RISC-V's MVL/vector configuration as an SSA value returned by and passed to functions is the first thought everyone has, including me, but I've tried extensively and it just can't work. I don't want to be too curt about this, but I'd really prefer to not side-track this RFC with rehashing the reasons why it doesn't work. If you want, Simon, you can start a thread on llvm-dev or email me privately and we can chat about it, but let's keep this thread on-topic.

I can see how transformations may accidentally interleave operations with different MVLs. Let's focus on this RFC for now.

Hi @simoll

Yes, the changes in this RFC are compatible with a physical simdlen that is unknown at compile time.

thanks, this is good to know. Apologies if I side-tracked a little bit the discussion, I was a bit concerned of seeing fixed-size vectors here and I wanted to dispel my doubts.

Regards,

In D53613#1273945, @rogfer01 wrote:

Hi @simoll

Yes, the changes in this RFC are compatible with a physical simdlen that is unknown at compile time.

thanks, this is good to know. Apologies if I side-tracked a little bit the discussion, I was a bit concerned of seeing fixed-size vectors here and I wanted to dispel my doubts.

Regards,

No worries. Since R5V is one of the targets with a dynamic vector length that this RFC is aimed at it is good to make clear that dvl intrinsics will work with R5V's re-configurable MVL.

Is there any further input on this RFC? Otherwise, i will send out an updated version to llvm-dev next week.

egarcia added a subscriber: egarcia.Oct 29 2018, 9:42 AM

I do not think we need to unnecessarily tie this proposal to "dynamic" vector length. These just have "explicit vector length" that is not implied by the operand vector length. May I suggest "evl" instead of "dvl"? Stands for Explicit Vector Length".

In D53613#1273874, @simoll wrote:

In D53613#1273846, @rkruppe wrote:

I am a bit less sure about the new attributes. If it was just about the intrinsics, I'd argue for creating helper query functions like that extract the relevant arguments from a call or Function object, using knowledge of the intrinsic signatures. But on my third reading of the text I finally realized you want to apply them to non-intrinsincs as well. An example of how each of these would be used (e.g. by RV or an OpenMP implementation) would be useful. I can see the value of passing the dynamic vector length in a specific register, but at a glance, unmasked_return seems rarely applicable to user-defined functions (similarly to the returned parameter attribute, which is a bit niche).

Two reasons: first, we want to avoid this kind of hard-coded knowledge about intrinsics and second, the attributes allow you to coalesce vector registers.

I don't really follow. If this information was only attached to intrinsics, then the choice is between specifying the meaning of the arguments once in the intrinsics TableGen file versus specifying it once in a single location in the C++ code. That doesn't seem like a significant difference. Of course, non-intrinsic functions are another matter, so this is entirely hypothetical anyway.

As a plus they simplify whole-function vectorization with dvl and predication beyond what's currently supported by OpenMP/VectorABI.

Example Let's say you'd want to vectorize a loop like this for a predicating/dynamicvl architecture:
for (int i =0; i <n; ++i) {
  double x = B[i];
  double y = C[i];
  A[i] = x > 0 ? bar(x) : y;
}
And there were a user-provided (or RV-auto-vectorized) SIMD version of bar with the following signature:
def <scalable 1 x double> @bar_dvl_nxv1(<scalable 1 x double> %a, <scalable 1 x double> unmasked_ret %b, <scalable 1 x i1> mask %mask, i32 dynamicvl %vl) {..}
Crucially, the implementation of @bar may use llvm.dvl intrinsics (or other function calls) internally but there is no way of telling the default return value (for masked-out lanes) without inspecting the IR... and worse you might just be given a declaration of that function.

However, by inspecting just the attributes the loop vectorizer could simply vectorize the call to @bar like below (you'd still need a way to tell vector shapes as in OpenMPs linear,aligned,.. clauses and VectorABI).
for.body.rv:
    %cond = fcmp <scalable 1 x i1> %b, splat 0.0
    %x = call <scalable 1 x double> @llvm.dvl.load(..., %dvl)
    %y = call <scalable 1 x double> @llvm.dvl.load(..., %dvl)
    ...
    %result = call <scalable 1 x double> @bar_dvl_nxv1( %x, %y, %cond, %dvl)
    ...
    llvm.dvl.store(%Aptr, %result, ...)
    ...
The select is folded into the vectorized function call, which is not possible otherwise.
Moreover if RV auto-vectorizes @bar it will automatically annotate the vectorized functions with these attributes for you.

Thank you for this example. It makes sense, though it still seems like a relatively small win (saves just a few data movement instructions here and there and removes only one overlapping live range). I don't know whether I'd bother introducing an attribute for that, but I won't object.

Vector register coalescing

If you have unmasked_ret knowledge about the data flow in you vector code, register allocation can exploit that to safe registers and avoid vector register spills.
Vector values with complementary masks can be coalesced into one register, relying on the fact that the masked-out part will be preserved through all arithmetic and function calls.
This already applies to the earlier example because there is no need to keep %y alive across the call site: instead we know that the return value of call @bar_dvl_nxv1 contains the parts of %y we care about.

Note that for arithmetic and everything else except calls to user-defined functions, this optimization is feasible to do in the backend without any changes to the IR (I know of one out-of-tree backend doing this, Nyuzi). Only when crossing function boundaries you need the different ABI to be able to ensure the optimization can happen.

Editorial note: I find the way "unmasked" is used here confusing. You seem to use it for "lanes where the mask bit is 0, which are disabled", but IME "unmasked" means operations with no predication at all and lanes with mask bit 0 are called "disabled" or "masked out" or something to that effect.

Sure. Do you have a specific suggestion? How about `maskedout_ret' instead?

That seems good enough for me, further bikeshedding can happen on the mailing list if someone cares enough.

Regarding the "dvl" naming concern @hsaito brought up, in RISC-V we call this concept *active vector length* and I've used that name on llvm-dev in the past. It's a bit more specific than "dynamic" (e.g., one might say "the first n lanes are *active*") and it's applicable to fixed-width SIMD. It could be abreviated to llvm.avl.* (minor name collision in that AVL is sometimes used for the *application* vector length which usually exceeds the vector register size, but c'est la vie).

kmitropo added a subscriber: kmitropo.Oct 29 2018, 2:34 PM

ebrevnov added a subscriber: ebrevnov.Oct 29 2018, 4:11 PM

Hi @simoll ,

thank you for sending this out. I was wondering whether we really want to separate the concepts of vector lane predication and dynamic vector length.

Before explaining what I mean with this, I need to define two concepts. Note that these definitions apply to both to Vector Length Agnostic (<scalable n x type>) or Vector Lenght Fixed (<n x type>) vector types.

Definition 1: Vector Lane Predication (VLP)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Vector Lane Predication is the operation of attaching to INST a vector with the same number of lanes {scalable }n, but with (boolean) lanes of type i1 that selects on which lanes the operation INST needs to be executed. This concept can be applied to any IR instruction that have vectors in any of the input operand or in the result.

This can be represented in the language as an additional parameter in form of a vector of i1 lanes.

To achieve VLP, the instruction cold be extended adding the VLP parameter as the last parameter ()this would make predicated INST distinguishable from the non predicated version):

%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in1, ..., <{scalable }n x i1> %VLP)

Definition 2: Dinamic Vector Length (DVL)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Dynamic Vector Length is the concept of attaching a run-time value of integer type, say i32 %DVL, that informs the instruction to operate on the first VLP lanes of the vector and discard (or set as undef) the remaining ones.

With this concept, the instruction INST should be extended to accept an additional scalar parameter that would represent the number of active lanes in the operation:

%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in2, ..., i32 %DVL)

DVL, VLP, Vector Length Agnostic (VLA) and Vector Length Specific (VLS) ISAs

To my understanding, DVL is orthogonal to VLA. In principle, you could have a VLA ISA that have an additional register for setting the DVL of the vectors.

DVL as part of VLP

Extending any language to support DVL and VLP would require some form of polymorfism in the language itself, because the same INST would need to have an additional parameter of two different types, the scalar i32 %DVL and the vector <{scalable }n x type> %VLP. I would like to argue that this is not necessary, because DVL is just a specific case of VLP. In fact, the same result of a DVL parameter could be obtained with a VLP parameter that would represent the split of the vector in active and inactive parts.

By using the extension of INST in the VLP definition, we could achieve DVL predication by introducing a new instruction, say VECTOR_SPLIT, that could generate the appropriate predicate partition for the instruction, as follows:

%VLP = <{scalable }n x i1> VECTOR_SPLIT( %i32 DVL)
%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in2, ..., <{scalable }n x i1> %VLP)

Lowering VLP vs lowering DVL

The pattern for lowering the DVL would involve a VECTOR_SPLIT (intrinsic or instruction that would make it generate a single DVL instruction that uses an implicit DVL register. This special VECTOR_SPLIT instruction would be rendered as per-lane predication for non-DVL ISA that support predication), as for example SVE or AVX512.

Please let me know if you don't agree with this reasoning. I might have an incomplete view of the problems introduced by such simplification, but I believe that this approach simplifies the process of introducing predication into the IR because it merges two concepts that (to my understanding, or ignorance!) have been treated separately until now.

Predication: extending IR instructions vs intrinsics

Whether we decide to go via extending the IR over adding new intrinsics, with the reasoning about the DVL and the VLP, we can assume that both cases will require just adding an additional VLP parameter, with, of course, the additional cost of adding a VECTOR_SPLIT instruction or intrinsic.

I personally would prefer to go by adding a new input parameter to the IR instruction (defaulted to "all true" when no predication is used), for two reasons:

avoid the explosion of intrinsics, that would need to be treated separately in all passes.
for experience in extending LLVM IR instrucion to support the scalable vector type, I believe that there is little risk in doing so.

Please understand that this is my personal preference. I understand that the community and also my colleagues at Arm might have a different view on this.

Do we really need to extend the IR?

My gut feeling is that we don't. I think we can ignore the scalable vs fixed vector types, and just consider the pure task of deciding whether we need to perform VLP predication vs DVL predication.Consider the following sequence.

%ret1 = <{scalable }n x ret_type> INST1(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret2 = <{scalable }n x ret_type> INST2(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret3 = <{scalable }n x ret_type> INST3(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret4 = <{scalable }n x ret_type> INST4(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)

It should be fairly easy to detect whether any SELECT instructions used on the input parameters or return values of the sequence is done using a VLP predication or DVL predication. At Arm, we use this
pattern matching mechanism to lower selects and unpredicated IR instructions into predicated SVE instruction. As far as I know, there are specific cases that might require us switching to a more sophisticated method (like native predication support in IR), but the pattern matching mechanism on select allowed us to cover the majority of the cases that we see.

I agree with @fpetrogalli here that there is some overlap between a "dynamic vector length" i32 %dvl and a mask of the form %m = <i1 0, ..., i1 0, i1 1, ..., i1 1> (or the reverse, if the lanes are to be interpreted in the other direction) where %dvl = llvm.ctpop(%m). As Francesco, puts it, we can always construct %m from %dvl. Perhaps I'm wrong, but I think in the context of strip-mined loops or whole function vectorisation, more elaborated masks that might arise due to control flow would always be subsumed by %m (i.e. will have a strict subset of the lanes enabled by %dvl).

In the original RFC proposal above:

All of the intrinsics in the llvm.dvl namespace take in two predicating parameters: a mask of bit vector type (eg <8 x i1>) and a dynamic vector length value (i32).

Masks seems more general to me which would make me think there is no real need to have intrinsics with %dvl. Perhaps I'm missing something really obvious here?

Kind regards,

Hi @fpetrogalli , @rogfer01 ,

The dynamic vector length is explicit because it crucially impacts the performance of vector instructions for SX-Aurora and RISC-V V (depending on hardware implementation).

SIMD instructions on NEC SX-Aurora execute in a pipelined fashion. While the vector registers hold 256 elements, the SIMD execution units operate on chunks of 32 elements.

Here is an example of two dvl invocations, which compute the same result:

a) llvm_dvl_fadd.v256f64(%x, %y, <full mask>, 13)
Since 13 < 32, the hardware will only issue 1 operation to its SIMD execution units. The occupation is thus something like 13/32 ~ 40%.

b`) llvm_dvl_fadd.v256f64(%x, %y, <mask with first 13 bits set>, 256)
Since the DVL is 256, the hardware will issue 8 operations to its SIMD units. However, only the first 13 elements are relevant leading to an occupation of 13/256 ~ 5%.

By keeping the bit mask and the DVL value separate, DVL intrinsics allow us to make this distinction cleanly on IR level.

ARM SVE does not have a DVL and so it will be lowered to the mask (as it's described in the RFC):

Lowering for targets w/o dynamic vector length (AVX512, ARM SVE, ..)

ARM SVE does not feature a dynamic vector length register.
Hence, the vector length needs to be promoted to the bit mask predicate, shown here for a LLVM-SVE target:

Block before legalization:

..
foo (..., %mask, %dynamic_vl)
...

After legalization:

%vscale32 = call i32 @llvm.experimental.vector.vscale.32()
...
%stepvector = call <scalable 4 x i32> @llvm.experimental.vector.stepvector.nxv4i32()
...
%vl_mask = icmp ult <scalable 4 x i1> %stepvector, %dynamic_vl
%new_mask = and <scalable 4 x i1> %mask, %vl_mask
foo (..., <scalable 4 x i1> %new_mask, i32 %vscale32)
...

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

Ignoring that difference for the rest of this post, it's true that you can implement "active vector length"-style loop control with just predication. Arm SVE even has dedicated instructions for generating these sorts of masks. So functionality-wise there would be no problem. However, when compiling for an architecture that has a vector length mechanism in hardware, we want to be able to reliably use it, not badly emulate it with the predication mechanism. If AVL is kept separate from masks, that is trivial. If the two are intermingled, code quality heavily depends on how good the backend is at disentangling the instructions computing the mask and separating it into mask_for_avl(%n) & ordinary_mask. That isn't too hard if every mask computation is in the canonical form emitted by the vectorizer, but experience shows that complex canonical forms tend to be mangled by optimizations before they reach the backend. For example, we run InstCombine after LV and that can do a whole lot to a tree composed of bitwise operators and to the selects they feed into.

To be clear, I've not (yet) tried it out, tried hard, and found that it actually doesn't work well enough. This is just educated speculation. But it seems like a plausible enough problem to me that it outweighs the complexity of an extra argument/concept (which isn't all that big, since anyone who doesn't care about active vector lengths can still understand the intrinsics in terms of masking).

Hi @simoll

Here is an example of two dvl invocations, which compute the same result:

a) llvm_dvl_fadd.v256f64(%x, %y, <full mask>, 13)
Since 13 < 32, the hardware will only issue 1 operation to its SIMD execution units. The occupation is thus something like 13/32 ~ 40%.

b`) llvm_dvl_fadd.v256f64(%x, %y, <mask with first 13 bits set>, 256)
Since the DVL is 256, the hardware will issue 8 operations to its SIMD units. However, only the first 13 elements are relevant leading to an occupation of 13/256 ~ 5%.

Aha I see.

I guess we anticipate the compiler would not be able to tell that a mask corresponds to a dvl if we chose not to represent it. I can imagine this happening to a function vectorised with an arbitrary mask, does this align with your expectations too?

Thanks a lot for the clarification.

In D53613#1281895, @rkruppe wrote:

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Excess lane semantics

While i expect this to be a rare event, i agree that still you might want to be able to preserve those excess (== beyond dvl in defining instruction`) lanes.
For regular function calls, semantics of excess lanes will depend on the calling convention (so there is some interplay between unmasked_ret semantics and the CC).
For dvl intrinsics, i see the following approach:

; %r is defined from [0, .. 42)
%r = llvm.dvl.fma.nxv1f64(%a, %b, <full mask>, 42)

; %full is defined from [0, .., %MVL)
%full = llvm.dvl.compose.nxv1f64(%a, %r, 42, %MVL)

declare @llvm.dvl.compose.nxv1f64(.. %a, .. %b, %split, i32 dynamicvl  %MVL)

The semantics of llvm.dvl.compose would be like select with lane idx: all lanes beyond '42' are taken from %a while the lanes >= 42 are taken from '%r'.
The advantage here is that the defined range of the result of llvm.dvl.compose is still encoded using the dynamicvl attribute.
The backend can fold this pattern down into the appropriate excess-preserving instruction.

Ignoring that difference for the rest of this post, it's true that you can implement "active vector length"-style loop control with just predication.

Lanes beyond dvl are undef... no need to "ignore" that difference, semantics allows for any value on those lanes.

In D53613#1281938, @rogfer01 wrote:

Hi @simoll

Here is an example of two dvl invocations, which compute the same result:

a) llvm_dvl_fadd.v256f64(%x, %y, <full mask>, 13)
Since 13 < 32, the hardware will only issue 1 operation to its SIMD execution units. The occupation is thus something like 13/32 ~ 40%.

b`) llvm_dvl_fadd.v256f64(%x, %y, <mask with first 13 bits set>, 256)
Since the DVL is 256, the hardware will issue 8 operations to its SIMD units. However, only the first 13 elements are relevant leading to an occupation of 13/256 ~ 5%.

Aha I see.

I guess we anticipate the compiler would not be able to tell that a mask corresponds to a dvl if we chose not to represent it. I can imagine this happening to a function vectorised with an arbitrary mask, does this align with your expectations too?

Thanks a lot for the clarification.

Yep, a call boundary would be the ultimate limit to the inferring-dvl-from-mask approach.

In D53613#1281974, @simoll wrote:

In D53613#1281895, @rkruppe wrote:

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

All the sinks which normally let values escape and thus force "demanded-X" style analyses to be conservative (stores, calls, etc.) are restricted with a VL, so we don't need to worry about other code using the higher lanes
In natural code (strip-mined loops or vectorized functions), most instructions trivially have the same VL (same SSA value) -- we don't need algebraic transformations or anything to show that two instructions access the same number of lanes
When the vector length does change in the middle of a computation, it typically becomes monotonically smaller in a fairly obvious way (e.g., speculative loads for search loop vectorization, or early exits in functions), so we don't need to worry about later instructions accessing elements that an earlier definition didn't write to
In the case where two instructions have completely unrelated vector lengths, they typically belong to two separate loops, between which you normally don't have any (vector-shaped) data flow to begin with, so the problem of what to do with the disabled lanes doesn't even arise

I haven't been able to evaluate this idea yet (partly because I don't have a complete enough compiler to compile benchmarks and see what happens) but for the above reasons I am quite optimistic that it will be enough to spill&fill vectors only up to VL, and solve other problems in the same stroke.

Aside: the advantage for spilling I see is just less memory traffic, I don't think you can make stack frames smaller since you generally won't have an upper bound on the vector lengths and recomputing frame layout every time the active vector length changes seems both very difficult for the backend and at best performance-neutral, more likely causing slowdowns.

Excess lane semantics While i expect this to be a rare event, i agree that still you might want to be able to preserve those excess (== beyond dvl in defining instruction`) lanes. For regular function calls, semantics of excess lanes will depend on the calling convention (so there is some interplay between unmasked_ret semantics and the CC). For dvl intrinsics, i see the following approach:
; %r is defined from [0, .. 42)
%r = llvm.dvl.fma.nxv1f64(%a, %b, <full mask>, 42)

; %full is defined from [0, .., %MVL)
%full = llvm.dvl.compose.nxv1f64(%a, %r, 42, %MVL)

declare @llvm.dvl.compose.nxv1f64(.. %a, .. %b, %split, i32 dynamicvl  %MVL)
The semantics of llvm.dvl.compose would be like select with lane idx: all lanes beyond '42' are taken from %a while the lanes >= 42 are taken from '%r'.
The advantage here is that the defined range of the result of llvm.dvl.compose is still encoded using the dynamicvl attribute.
The backend can fold this pattern down into the appropriate excess-preserving instruction.

I'm not really sure what use the dynamicvl attribute on %MVL is if that parameter is always MVL. I guess you envision also using this intrinsic to stitch together shorter vectors? That's not an operation I've encountered before.

Thinking some more about it, a more convincing reason to keep it that way (to me) is that different architectures might have different behavior for masking and lanes-beyond-VL. RISC-V V, for a long time, wanted to zero lanes in both cases rather than keeping the old value around. For masking, merging with an "old" value is usually done because the application needs it, so generating an explicit merge instruction in those cases is probably fine, but having to copy over the completely irrelevant higher lanes would be pretty bad. The demanded-elements analysis mentioned above could help with this, though.

Ignoring that difference for the rest of this post, it's true that you can implement "active vector length"-style loop control with just predication.

Lanes beyond dvl are undef... no need to "ignore" that difference, semantics allows for any value on those lanes.

You're right, in that direction everything's fine, I was just hesitant to equate the two approaches to loop control entirely because (if vector length-predication makes lanes undef) the opposite direction doesn't work.

In D53613#1282063, @rkruppe wrote:

In D53613#1281974, @simoll wrote:

In D53613#1281895, @rkruppe wrote:

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

All the sinks which normally let values escape and thus force "demanded-X" style analyses to be conservative (stores, calls, etc.) are restricted with a VL, so we don't need to worry about other code using the higher lanes

I disagree. You brought up regular vector instructions yourself: what happens if a full vector store (non-EVL) goes to a buffer and that buffer is then passed to a scalar function bar(double * rawData)? You have no idea which lanes bar is going to access. If the store value is defined with an explicit %dvl and undef on excess, you will know, however.

In natural code (strip-mined loops or vectorized functions), most instructions trivially have the same VL (same SSA value) -- we don't need algebraic transformations or anything to show that two instructions access the same number of lanes

This applies to both interpretations of excess lanes.

When the vector length does change in the middle of a computation, it typically becomes monotonically smaller in a fairly obvious way (e.g., speculative loads for search loop vectorization, or early exits in functions), so we don't need to worry about later instructions accessing elements that an earlier definition didn't write to

This may hold for some, basic vectorization schemes but it is too restrictive for more advanced techniques: e.g."FlexVec: auto-vectorization for irregular loops.", PLDI '16 or Dynamic SIMD Vector Lane Scheduling where new workitems are pulled into a spinning loop to avoid reducing the DVL whenever a thread drops out but you may decide to only do so if the occupation drops below a certain threshold.
RV already implements a transformation of this kind for regular LLVM IR and it will do so using the intrinsics and attributes we propose here.

Aside: the advantage for spilling I see is just less memory traffic, I don't think you can make stack frames smaller since you generally won't have an upper bound on the vector lengths and recomputing frame layout every time the active vector length changes seems both very difficult for the backend and at best performance-neutral, more likely causing slowdowns.

The impact can be significant if the differences is a spill+reload of 256x8bytes worth of data. Besides, you could actually avoid memory all together, e.g. by compressing to vectors of size MVL/2 into one.

Excess lane semantics While i expect this to be a rare event, i agree that still you might want to be able to preserve those excess (== beyond dvl in defining instruction`) lanes. For regular function calls, semantics of excess lanes will depend on the calling convention (so there is some interplay between unmasked_ret semantics and the CC). For dvl intrinsics, i see the following approach:
; %r is defined from [0, .. 42)
%r = llvm.dvl.fma.nxv1f64(%a, %b, <full mask>, 42)

; %full is defined from [0, .., %MVL)
%full = llvm.dvl.compose.nxv1f64(%a, %r, 42, %MVL)

declare @llvm.dvl.compose.nxv1f64(.. %a, .. %b, %split, i32 dynamicvl  %MVL)
The semantics of llvm.dvl.compose would be like select with lane idx: all lanes beyond '42' are taken from %a while the lanes >= 42 are taken from '%r'.
The advantage here is that the defined range of the result of llvm.dvl.compose is still encoded using the dynamicvl attribute.
The backend can fold this pattern down into the appropriate excess-preserving instruction.
I'm not really sure what use the dynamicvl attribute on %MVL is if that parameter is always MVL. I guess you envision also using this intrinsic to stitch together shorter vectors? That's not an operation I've encountered before.

dynamicvl %MVL is the canonical way to express this without adding hard-coded knowledge. Lowering evl.compose is cheap on ARM-SVE and SX-Aurora (1. generate mask, 2. blend).

Thinking some more about it, a more convincing reason to keep it that way (to me) is that different architectures might have different behavior for masking and lanes-beyond-VL. RISC-V V, for a long time, wanted to zero lanes in both cases rather than keeping the old value around. For masking, merging with an "old" value is usually done because the application needs it, so generating an explicit merge instruction in those cases is probably fine, but having to copy over the completely irrelevant higher lanes would be pretty bad. The demanded-elements analysis mentioned above could help with this, though.

With undef-on-excess-lanes, you could actually implement a demanded-elements analysis on IR level. You'd then store your findings to lower the dynamicvl %dvl argument of the producing instruction. This may have positive interactions with other parts of LLVM as well. For example, in legalization for non-dvl targets it could mean that a EVL intrinsic could be pruned to the native vector length, implying that it lowered to a plain vector instruction (without any DVL/MVL loop).

In D53613#1282096, @simoll wrote:

In D53613#1282063, @rkruppe wrote:

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Ideally you'd want these intrinsics for all code, yes, but

since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last
there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

But yes, other than that you can run the same analysis before ISel.

All the sinks which normally let values escape and thus force "demanded-X" style analyses to be conservative (stores, calls, etc.) are restricted with a VL, so we don't need to worry about other code using the higher lanes

I disagree. You brought up regular vector instructions yourself: what happens if a full vector store (non-EVL) goes to a buffer and that buffer is then passed to a scalar function bar(double * rawData)? You have no idea which lanes bar is going to access. If it's an EVL-store, you will know, however.

If there is a store of this sort in the input program, there is nothing you can do about it, you can't justify changing it to an EVL-store at any stage, MIR or IR. It will just have to store the whole vector. Vector-ignorant IR optimizations don't introduce this sort of code (they should keep things in SSA values or promote from memory to SSA values, not demote) and vectorization passes which turn scalar stores into vector stores know to use the bounded intrinsics -- and usually have to use them for correctness anyway.

In natural code (strip-mined loops or vectorized functions), most instructions trivially have the same VL (same SSA value) -- we don't need algebraic transformations or anything to show that two instructions access the same number of lanes

This applies to both interpretations of excess lanes.

Yes, I am not saying this analysis isn't possible with excess lanes being undef, just arguing excess lanes being undef is not necessary for good codegen.

When the vector length does change in the middle of a computation, it typically becomes monotonically smaller in a fairly obvious way (e.g., speculative loads for search loop vectorization, or early exits in functions), so we don't need to worry about later instructions accessing elements that an earlier definition didn't write to

This may hold for some, basic vectorization schemes but it is too restrictive for more advanced techniques: e.g."FlexVec: auto-vectorization for irregular loops.", PLDI '16 or Dynamic SIMD Vector Lane Scheduling where new workitems are pulled into a spinning loop to avoid reducing the DVL whenever a thread drops out but you may decide to only do so if the occupation drops below a certain threshold.
RV already implements these transformations for regular LLVM IR and it will do so using the intrinsics and attributes we propose here.

Sorry, I was only talking about changes within an iteration of the vectorized loop. Across iterations, of course the vector length doesn't need to fall monotonically, even basic search loop vectorization (e.g. of strlen) doesn't satisfy that. It seems all the strategies you mention are smarter about how they pack scalar work items into vectors, but still only do this between iterations of the vectorized loop, i.e., don't do something like this:

loop {
    // first half of the work
    // pull in more work items
    // second half of the work on existing+newly pulled in work items
}

Is that right?

In the case where two instructions have completely unrelated vector lengths, they typically belong to two separate loops, between which you normally don't have any (vector-shaped) data flow to begin with, so the problem of what to do with the disabled lanes doesn't even arise

I haven't been able to evaluate this idea yet (partly because I don't have a complete enough compiler to compile benchmarks and see what happens) but for the above reasons I am quite optimistic that it will be enough to spill&fill vectors only up to VL, and solve other problems in the same stroke.

Aside: the advantage for spilling I see is just less memory traffic, I don't think you can make stack frames smaller since you generally won't have an upper bound on the vector lengths and recomputing frame layout every time the active vector length changes seems both very difficult for the backend and at best performance-neutral, more likely causing slowdowns.

The impact can be significant if the differences is a spill+reload of 256x8bytes worth of data. Besides, you could actually avoid memory all together, e.g. by compressing to vectors of size MVL/2 into one.

Impact on what? It's obvious that one shouldn't have to spill 256x8 bytes if only a small subset of the lanes is needed, I'm just saying I don't see a good way to avoid allocating stack space for the full vector ((assuming one needs a stack slot at all).

I'm not really sure what use the dynamicvl attribute on %MVL is if that parameter is always MVL. I guess you envision also using this intrinsic to stitch together shorter vectors? That's not an operation I've encountered before.

dynamicvl %MVL is the canonical way to express this without adding hard-coded knowledge. Lowering evl.compose is cheap on ARM-SVE and SX-Aurora (1. generate mask, 2. blend).

If this intrinsic is only ever used with dynamicvl %MVL and not with shorter dynamicvls, there's nothing there to express and we could drop the parameter altogether (not just the attribute, as I realize now). If you have a use case for compose(..., dynamicvl %something_shorter), then sure.

Thinking some more about it, a more convincing reason to keep it that way (to me) is that different architectures might have different behavior for masking and lanes-beyond-VL. RISC-V V, for a long time, wanted to zero lanes in both cases rather than keeping the old value around. For masking, merging with an "old" value is usually done because the application needs it, so generating an explicit merge instruction in those cases is probably fine, but having to copy over the completely irrelevant higher lanes would be pretty bad. The demanded-elements analysis mentioned above could help with this, though.

With undef-on-excess-lanes, you could actually implement a demanded-elements analysis on IR level. You'd then store your findings to lower the dynamicvl %dvl argument of the producing instruction. This may have positive interactions with other parts of LLVM as well. For example, in legalization for non-dvl targets it could mean that a EVL intrinsic could be pruned to the native vector length, implying that it lowered to a plain vector instruction (without any DVL/MVL loop).

You can have the analysis on IR either way, that some lanes are never read isn't affected by what you put in those lanes. I don't quite understand your point about legalization on other targets -- the analysis I propose makes the vector length shorter, to use a packed-SIMD architecture's full vectors you need the opposite (proving you're allowed to widen it to a full operation), which is a rather different task. If you can apply my analysis then you either already have full-width vector ops and don't need to do anything more to lower them well on a conventional SIMD architecture, or it will just replace one unknown dynamic vector length with another, possibly shorter, one.

All this being said, I want to be clear I don't really oppose excess lanes being undef. It doesn't seem necessary or even particularly helpful for the optimizations and codegen strategy I have planned, but it's not an obstacle either, so I'm happy to let the intrinsics be defined this way.

In D53613#1282198, @rkruppe wrote:

In D53613#1282096, @simoll wrote:

In D53613#1282063, @rkruppe wrote:

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Ideally you'd want these intrinsics for all code, yes, but

since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

But yes, other than that you can run the same analysis before ISel.

All the sinks which normally let values escape and thus force "demanded-X" style analyses to be conservative (stores, calls, etc.) are restricted with a VL, so we don't need to worry about other code using the higher lanes

I disagree. You brought up regular vector instructions yourself: what happens if a full vector store (non-EVL) goes to a buffer and that buffer is then passed to a scalar function bar(double * rawData)? You have no idea which lanes bar is going to access. If it's an EVL-store, you will know, however.

If there is a store of this sort in the input program, there is nothing you can do about it, you can't justify changing it to an EVL-store at any stage, MIR or IR. It will just have to store the whole vector. Vector-ignorant IR optimizations don't introduce this sort of code (they should keep things in SSA values or promote from memory to SSA values, not demote) and vectorization passes which turn scalar stores into vector stores know to use the bounded intrinsics -- and usually have to use them for correctness anyway.

Good point.

I think for me boils it down to this: without undef-on-excess there is no obvious way for programmers to specify that the result of an operation does not matter beyond the excess lanes and instead they are at the mercy of some clever analysis in the backend. HPC programmers dislike this sort of thing. A good example for this is register allocation/spilling where developers go a long way to guide the spilling heuristics in the right direction by placing branch probabilities and the like.

Sorry, I was only talking about changes within an iteration of the vectorized loop. Across iterations, of course the vector length doesn't need to fall monotonically, even basic search loop vectorization (e.g. of strlen) doesn't satisfy that. It seems all the strategies you mention are smarter about how they pack scalar work items into vectors, but still only do this between iterations of the vectorized loop, i.e., don't do something like this:
loop {
    // first half of the work
    // pull in more work items
    // second half of the work on existing+newly pulled in work items
}
Is that right?

Basically, yes:

while (any thread live) {
  // perform work
  // evaluate loop exit condition and deactivate leaving lanes
  if (/*number of active lanes*/ < threshold) {
    // pull in new work onto the inactive lanes
  }
}

In the case where two instructions have completely unrelated vector lengths, they typically belong to two separate loops, between which you normally don't have any (vector-shaped) data flow to begin with, so the problem of what to do with the disabled lanes doesn't even arise

I haven't been able to evaluate this idea yet (partly because I don't have a complete enough compiler to compile benchmarks and see what happens) but for the above reasons I am quite optimistic that it will be enough to spill&fill vectors only up to VL, and solve other problems in the same stroke.

Aside: the advantage for spilling I see is just less memory traffic, I don't think you can make stack frames smaller since you generally won't have an upper bound on the vector lengths and recomputing frame layout every time the active vector length changes seems both very difficult for the backend and at best performance-neutral, more likely causing slowdowns.

The impact can be significant if the differences is a spill+reload of 256x8bytes worth of data. Besides, you could actually avoid memory all together, e.g. by compressing to vectors of size MVL/2 into one.

Impact on what? It's obvious that one shouldn't have to spill 256x8 bytes if only a small subset of the lanes is needed, I'm just saying I don't see a good way to avoid allocating stack space for the full vector ((assuming one needs a stack slot at all).

Allocating stack space does not imply memory traffic per se. Spilling however does.

I'm not really sure what use the dynamicvl attribute on %MVL is if that parameter is always MVL. I guess you envision also using this intrinsic to stitch together shorter vectors? That's not an operation I've encountered before.

dynamicvl %MVL is the canonical way to express this without adding hard-coded knowledge. Lowering evl.compose is cheap on ARM-SVE and SX-Aurora (1. generate mask, 2. blend).

If this intrinsic is only ever used with dynamicvl %MVL and not with shorter dynamicvls, there's nothing there to express and we could drop the parameter altogether (not just the attribute, as I realize now). If you have a use case for compose(..., dynamicvl %something_shorter), then sure.

It's likely we will see vector code that only uses fractions of MVL (eg %MVL/2).

Thinking some more about it, a more convincing reason to keep it that way (to me) is that different architectures might have different behavior for masking and lanes-beyond-VL. RISC-V V, for a long time, wanted to zero lanes in both cases rather than keeping the old value around. For masking, merging with an "old" value is usually done because the application needs it, so generating an explicit merge instruction in those cases is probably fine, but having to copy over the completely irrelevant higher lanes would be pretty bad. The demanded-elements analysis mentioned above could help with this, though.

With undef-on-excess-lanes, you could actually implement a demanded-elements analysis on IR level. You'd then store your findings to lower the dynamicvl %dvl argument of the producing instruction. This may have positive interactions with other parts of LLVM as well. For example, in legalization for non-dvl targets it could mean that a EVL intrinsic could be pruned to the native vector length, implying that it lowered to a plain vector instruction (without any DVL/MVL loop).

You can have the analysis on IR either way, that some lanes are never read isn't affected by what you put in those lanes. I don't quite understand your point about legalization on other targets -- the analysis I propose makes the vector length shorter, to use a packed-SIMD architecture's full vectors you need the opposite (proving you're allowed to widen it to a full operation), which is a rather different task. If you can apply my analysis then you either already have full-width vector ops and don't need to do anything more to lower them well on a conventional SIMD architecture, or it will just replace one unknown dynamic vector length with another, possibly shorter, one.

%dvl = <complex integer arithmetic>
%dvl2 = <some more complex integer arithmetic>

%R = llvm.evl.fadd.v512f32(%a, %b, %m, 16)
%userOne = ... evl.fma.v512f32(%R, ..., %dvl)
%userTwo = ... evl.fma.v512f32(%R, ..., %dvl2)

With undef-on-excess, operations %userOne and %userTwo can be pruned to a width of 16.
Without looking any further at users of %userOne and %userTwo, this could be legalized into

%R = avx512.fma(%a, %b, %m)
%userOne = avx512.fma.v16f32(%R, ...)
%userTwo = avx512.fma.v16f32(%R, ...)

That's hard to do with maskedout-on-excess since it would all depend on %dvl and %dvl2.. and there would be no simple way for users to convey the information "dont care about %R beyond 16" in IR.

All this being said, I want to be clear I don't really oppose excess lanes being undef. It doesn't seem necessary or even particularly helpful for the optimizations and codegen strategy I have planned, but it's not an obstacle either, so I'm happy to let the intrinsics be defined this way.

Thank you for the scrutiny! It's important to validate this proposal so we have a robust representation that should work for all targets. I'll update the RFC shortly keeping undef-on-excess in place.

HI @fpetrogalli,

In D53613#1281444, @fpetrogalli wrote:
Hi @simoll ,

Definition 1: Vector Lane Predication (VLP)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Vector Lane Predication is the operation of attaching to INST a vector with the same number of lanes {scalable }n, but with (boolean) lanes of type i1 that selects on which lanes the operation INST needs to be executed. This concept can be applied to any IR instruction that have vectors in any of the input operand or in the result.

This can be represented in the language as an additional parameter in form of a vector of i1 lanes.

To achieve VLP, the instruction cold be extended adding the VLP parameter as the last parameter ()this would make predicated INST distinguishable from the non predicated version):
%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in1, ..., <{scalable }n x i1> %VLP)

As you state here, predicated INSTs would be indistinguishable from unpredicated INSTs if you are unaware of predication. As a result, every existing transformation that touches vector instructions will happily ignore the predicate and break your code. In effect this is similar to using metadata to annotate the predicate.

Definition 2: Dinamic Vector Length (DVL)

For a given operation INST that operates on vector inputs of type <{scalable }n x type>, Dynamic Vector Length is the concept of attaching a run-time value of integer type, say i32 %DVL, that informs the instruction to operate on the first VLP lanes of the vector and discard (or set as undef) the remaining ones.

With this concept, the instruction INST should be extended to accept an additional scalar parameter that would represent the number of active lanes in the operation:
%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in2, ..., i32 %DVL)

Same reasoning as above.

DVL, VLP, Vector Length Agnostic (VLA) and Vector Length Specific (VLS) ISAs

To my understanding, DVL is orthogonal to VLA. In principle, you could have a VLA ISA that have an additional register for setting the DVL of the vectors.

DVL as part of VLP

Extending any language to support DVL and VLP would require some form of polymorfism in the language itself, because the same INST would need to have an additional parameter of two different types, the scalar i32 %DVL and the vector <{scalable }n x type> %VLP. I would like to argue that this is not necessary, because DVL is just a specific case of VLP. In fact, the same result of a DVL parameter could be obtained with a VLP parameter that would represent the split of the vector in active and inactive parts.

By using the extension of INST in the VLP definition, we could achieve DVL predication by introducing a new instruction, say VECTOR_SPLIT, that could generate the appropriate predicate partition for the instruction, as follows:
%VLP = <{scalable }n x i1> VECTOR_SPLIT( %i32 DVL)
%ret = <{scalable }n x ret_type> INST(<{scalable }n x type1> %in1, <{scalable }n x type2> %in2, ..., <{scalable }n x i1> %VLP)

Call boundaries will obscure any predicate producing code (Also see the Comment by @rogfer01.

Predication: extending IR instructions vs intrinsics

Whether we decide to go via extending the IR over adding new intrinsics, with the reasoning about the DVL and the VLP, we can assume that both cases will require just adding an additional VLP parameter, with, of course, the additional cost of adding a VECTOR_SPLIT instruction or intrinsic.

I personally would prefer to go by adding a new input parameter to the IR instruction (defaulted to "all true" when no predication is used), for two reasons:

avoid the explosion of intrinsics, that would need to be treated separately in all passes.

In fact, there already is an explosion of intrinsics. Basically every SIMD target exposes its ISA via buitin functions. These are highly target-specific and the semantics might not even be publically documented.
This RFC actually tries to come up with a set of target-agnostic,predicated,dvl primitives that should work well on all SIMD targets so we do not have to look into target-specific intrinsics to get functionality in IR (e.g. for predicated fdiv, if floating point is trapping).

for experience in extending LLVM IR instrucion to support the scalable vector type, I believe that there is little risk in doing so.

It's deceptively easy to modify the core IR. However, this has ramifications for all users (transformations, analyses, backends, ...). E.g. consider the concerns and discussions surrounding LLVM-SVE (llvm-dev, also D53695).

Please understand that this is my personal preference. I understand that the community and also my colleagues at Arm might have a different view on this.

Thank you for your take on this!

Do we really need to extend the IR?

My gut feeling is that we don't. I think we can ignore the scalable vs fixed vector types, and just consider the pure task of deciding whether we need to perform VLP predication vs DVL predication.Consider the following sequence.
%ret1 = <{scalable }n x ret_type> INST1(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret2 = <{scalable }n x ret_type> INST2(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret3 = <{scalable }n x ret_type> INST3(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
%ret4 = <{scalable }n x ret_type> INST4(<{scalable }n x type1> ..., <{scalable }n x type2> , ...,)
It should be fairly easy to detect whether any SELECT instructions used on the input parameters or return values of the sequence is done using a VLP predication or DVL predication. At Arm, we use this
pattern matching mechanism to lower selects and unpredicated IR instructions into predicated SVE instruction. As far as I know, there are specific cases that might require us switching to a more sophisticated method (like native predication support in IR), but the pattern matching mechanism on select allowed us to cover the majority of the cases that we see.

This does not work for general function calls or any instructions with side effects (e.g. memory accesses), which is why there are already a bunch of intrinsics in the llvm.masked namespace (which this RFC hopes to supersede).
For the reasons brought up earlier (eg by @rkruppe) this is a non-starter for Dynamic Vector Length targets.

In D53613#1282267, @simoll wrote:

In D53613#1282198, @rkruppe wrote:

Ideally you'd want these intrinsics for all code, yes, but

since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

Oh, I misunderstood you, sorry. If you meant lowering "add <n x i32>" to "dvl.add with dvl = MAX" that makes sense and it's basically what I'll be doing in RISC-V too (though I think I can just use patterns directly, no custom lowering code required). However, that still produces an inefficient full-width operation that isn't always necessary and fixing that needs some analysis.

there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

I think for me boils it down to this: without undef-on-excess there is no obvious way for programmers to specify that the result of an operation does not matter beyond the excess lanes and instead they are at the mercy of some clever analysis in the backend. HPC programmers dislike this sort of thing. A good example for this is register allocation/spilling where developers go a long way to guide the spilling heuristics in the right direction by placing branch probabilities and the like.

This is a good point. There's a bit more subtlety to it (users don't write LLVM IR themselves, we could pick a multi-instruction canonical form for making excess lanes undef and lower the code they write to that) but overall I agree that predictable optimizations are very important and since "don't care about excess lanes" is by far the more common choice we should optimize for representing that intent more naturally & reliably.

%dvl = <complex integer arithmetic>
%dvl2 = <some more complex integer arithmetic>

%R = llvm.evl.fadd.v512f32(%a, %b, %m, 16)
%userOne = ... evl.fma.v512f32(%R, ..., %dvl)
%userTwo = ... evl.fma.v512f32(%R, ..., %dvl2)
With undef-on-excess, operations %userOne and %userTwo can be pruned to a width of 16.
Without looking any further at users of %userOne and %userTwo, this could be legalized into
%R = avx512.fma(%a, %b, %m)
%userOne = avx512.fma.v16f32(%R, ...)
%userTwo = avx512.fma.v16f32(%R, ...)
That's hard to do with maskedout-on-excess since it would all depend on %dvl and %dvl2.. and there would be no simple way for users to convey the information "dont care about %R beyond 16" in IR.

Thanks you for the example, it makes sense, I guess I am too focused on the dynamic-MVL case. (I would not call this a "demanded"-style pass though, it just propagates the vector length forward and uses undef-excess-lanes to justify not having to compare 16 with %dvl and %dvl2.)

In D53613#1286515, @rkruppe wrote:

In D53613#1282267, @simoll wrote:

In D53613#1282198, @rkruppe wrote:

there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

In D53613#1286618, @simoll wrote:

In D53613#1286515, @rkruppe wrote:

In D53613#1282267, @simoll wrote:

In D53613#1282198, @rkruppe wrote:

there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

Yeah there's ways to pass on this information through MIR. Defining the RVV MachineInsts differently than the architecture defines the corresponding instructions isn't a good way in my opinion, but metadata on the instructions might work well. In any case this has drifted away from being directly relevant to this RFC. Thank you for the interesting discussion and once again for creating this RFC!

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

FYI I noticed the argument numbers for the new attributes don't match the actual parameters in many cases (they often seem to be off by one). No big deal, just something to keep in mind for when the RFC goes through and the patch gets submitted for real.

include/llvm/IR/Intrinsics.td
1020	One of these i32 arguments it the `dynamic_vl` argument, what's the other? Alignment?

Changes

dynamic_vl -> vlen.
unmasked_ret -> maskedout_ret.
DVL -> EVL (Explicit Vector Length).
Added llvm.evl.compose(%A, %B, %pivot, %mvl) intrinsic (select on lane pivot).

In D53613#1287364, @rkruppe wrote:

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

That's great news! Thanks for trying it out. Speaking of ISel, there should probably be one new ISD node type per EVL intrinsic.

FYI I noticed the argument numbers for the new attributes don't match the actual parameters in many cases (they often seem to be off by one). No big deal, just something to keep in mind for when the RFC goes through and the patch gets submitted for real.

The patch in this RFC is a showcase version to discuss the general concept (and sort out bike shedding issues). The actual patches will be cleaner.

include/llvm/IR/Intrinsics.td
1020	Yep. That's alignment as in llvm.masked.store.

rkruppe added inline comments.Nov 7 2018, 9:18 AM

include/llvm/IR/Intrinsics.td
1020	Ah, I forgot that the existing masked intrinsics also take the alignment as a normal parameter. I think new intrinsics shouldn't follow that precedent, nowadays we have the `align` attribute for call sites (already used e.g. by `llvm.memcpy`), so the alignment information can be supplied like this: call void @llvm.evl.store(<4 x i32> %v, <4 x i32>* align 16 %p, ...) This ensures that the alignment is a compile time constant, and during instruction selection and later stages, it should be stored in the MachineMemOperand, not be an extra operand (that's the part that caused me trouble in my experiments).

simoll marked 3 inline comments as done.Nov 9 2018, 5:15 AM

simoll added inline comments.

include/llvm/IR/Intrinsics.td
1020	Ok. I'll drop the alignment arguments in the next update.

anemet added a subscriber: anemet.Dec 20 2018, 3:08 PM

rudkx added a subscriber: rudkx.Dec 20 2018, 3:13 PM

rengolin added a subscriber: rengolin.Dec 21 2018, 3:20 AM

@simoll
Sorry, I'm joining this conversation late.

Given our recent conversation on llvm-dev, can we capture your PredicatedVectorType idea here? Or maybe that warrants a separate RFC. I don't think it addressess EVL at all but maybe there is some way to extend it. I just want to make sure we record the ideas so that when we look to enhance the core IR, we take a look at them.

Part of me is left wondering whether PredicatedVectorType is something we should explore as an alternative to intrinsics. I know lots of people are waiting on this support so I don't want to upset the apple cart and make everyone wait for an idea that may or may not work out.

cameron.mcinally added a subscriber: cameron.mcinally.Dec 21 2018, 9:52 AM

In D53613#1339191, @greened wrote:

@simoll
Sorry, I'm joining this conversation late.

No worries. You are here now :)

Given our recent conversation on llvm-dev, can we capture your PredicatedVectorType idea here? Or maybe that warrants a separate RFC. I don't think it addressess EVL at all but maybe there is some way to extend it. I just want to make sure we record the ideas so that when we look to enhance the core IR, we take a look at them.

I think there should be an RFC about defining the road map to proper predication/EVL support in LLVM core IR and that is where PredicatedVectorType (would - see below) belong (I'll start an RFC in January).

Part of me is left wondering whether PredicatedVectorType is something we should explore as an alternative to intrinsics. I know lots of people are waiting on this support so I don't want to upset the apple cart and make everyone wait for an idea that may or may not work out.

I think we should go with EVL intrinsics & attributes now.

That being said, i believe the code duplication issue with intrinsics can be worked around with proper engineering/abstraction. Eg we could have EVLInstruction / EVLBinaryOperator classes to abstract away the gap:

Value & FADDevl = "`%v = call @llvm.evl.fadd.v51232(...)`" // handle to the RHS instruction

Value & FADDclassic = "%w = fadd <512 x f32> ..."

auto & evlPredFADD = cast<PredicatedBinaryOperator>(FADDevl)
assert(evlPredFADD.getMask());
auto & evlClassicFADD = cast<PredicatedBinaryOperator>(FADDclassic)
assert(!evlClassicFADD.getMask());
// otw, evlPredFADD and evlClassicFADD behave just like the BinaryOperator today


// the class hierarchy could look like this:

// everything below this class in the hierarchy is potentially predicated
class PredicatedInstruction : public Instruction {
public:
  virtual Value* getMask();
  virtual Value* getExplicitVectorLength();
};

class PredicatedBinaryOperator : public PredicatedInstruction {
  // all members of today's BinaryOperator
};

// the regular BinaryOperator is unpredicated as it is today
class BinaryOperator : public PredicatedBinaryOperator {
public:
  getMask() { return nullptr; }
  getExplicitVectorLength() { return nullptr; }
};

// the same familiar interface but in fact a wrapper for EVL intrinsics..
class EVLBinaryOperator : public PredicatedBinaryOperator {
  CallInst * IntrinsicCall;
  int MaskPos;
  int EVLPos;
public:
  getMask() { return MaskPos < 0 ? nullptr : IntrinsicCall->getArgOperand(MaskPos); }
  getExplicitVectorLength { return EVLPos < 0 ? nullptr : IntrinsicCall->getArgOperand(EVLPos); }
}:

You can now, as in the PredicatedVectorType proposal, switch transformations/anslysis from BinaryOperator to PredicatedBinaryOperator one at a time.
In fact, if this is executed properly it doesn't actually matter that much anymore whether we use intrinsics or predicated core instructions... and so PredicatedVectorType is obsolete.

Changes:

maskedout_ret -> passthru.
removed legacy alignment argument from scatter/gather.
fixed some attribute placements in EVL intrinsics.

Harbormaster completed remote builds in B26232: Diff 179308.Dec 21 2018, 10:26 AM

simoll marked an inline comment as done.Dec 21 2018, 10:27 AM

In D53613#1339293, @simoll wrote:

You can now, as in the PredicatedVectorType proposal, switch transformations/anslysis from BinaryOperator to PredicatedBinaryOperator one at a time.
In fact, if this is executed properly it doesn't actually matter that much anymore whether we use intrinsics or predicated core instructions... and so PredicatedVectorType is obsolete.

I understand why BinaryOperator inherits from PredicatedBinaryOperator (so existing code looking at BinaryOperators won't optimize predicated code it doesn't know about). It's little mind-bending as it's not really an is-a relationship anymore. Is there code that might look at Instructions and have the same problem? That is, could code that looks at Instruction illegally optimize predicated code it's not aware of?

What are the semantics for a call that doesn't have a passthru attribute? For disabled lanes what's the expected output value? I hope it's undef.

In D53613#1339653, @greened wrote:

I understand why BinaryOperator inherits from PredicatedBinaryOperator (so existing code looking at BinaryOperators won't optimize predicated code it doesn't know about). It's little mind-bending as it's not really an is-a relationship anymore. Is there code that might look at Instructions and have the same problem? That is, could code that looks at Instruction illegally optimize predicated code it's not aware of?

It is actually very much an is-a relationship because an unpredicated operator is an (optionally) predicated operator that never has a predicate (so it`s a proper subset functionality wise).
It's still just an intrinsic so i do not see how transformations that only look at Instruction and don't dig deeper could break the EVL intrinsic call.

In D53613#1339680, @greened wrote:

What are the semantics for a call that doesn't have a passthru attribute? For disabled lanes what's the expected output value? I hope it's undef.

In the general case (call @foo), the result on masked-off lanes is just unknown. One example already in the intrinsics are masked reductions:

%r = llvm.evl.reduce.fadd(<8 x f64> %data, mask <8 x i1> %mask, vlen i32 %evl)

One of the main purposes of the mask attribute is to annotate the mask argument for calling conventions. Attaching additional semantics to mask on its own it would defy that purpose.

It's still just an intrinsic so i do not see how transformations that only look at Instruction and don't dig deeper could break the EVL intrinsic call.

Yes, good point.

In the general case (call @foo), the result on masked-off lanes is just unknown.

Would declaring the elements undef have any advantages? A use of undef is known to be, well, undefined behavior and the optimizer can take advantage of that. If the value is simply "unknown" I don't think the same can be done.

In D53613#1347782, @greened wrote:

In the general case (call @foo), the result on masked-off lanes is just unknown.

Would declaring the elements undef have any advantages? A use of undef is known to be, well, undefined behavior and the optimizer can take advantage of that. If the value is simply "unknown" I don't think the same can be done.

Well, there is a clear downside to declaring masked-off return lanes undef by default:
Say, a user defines a function @foo that takes a mask and produces a well-defined result on masked-off lanes. With undef on masked-off lanes, the user is not able to use the mask attribute for the mask argument.
That means that @foo is precluded from calling conventions that require an annotated mask (which may exist at some point).

The underlying problem is that we can not assume that @foo has a map-like semantics (e.g. that the output element at position l only depends on the inputs' elements at position l).

I'd rather not assign any additional meaning to the mask attribute on its own than "this argument is the mask" (unless passthru is given). If we find a good reason to change that we can always add function attributes to describe that behavior. For example, it would be helpful to know that a function call can be skipped when the mask is all-false (even if the function may access memory). That is however stronger than "undef on masked-off lanes" already.

Changes

EVL intrinsics no longer use the passthru attribute. An explicit select should be used to obtain defined vector elements where the mask in the intrinsic was false. passthru is still useful for general functions as in call @foo.
The %passthru argument of llvm.evl.gather was dropped in favor of a select-based pattern as above.
DAGBuilder integration (llvm.evl.fadd -> evl_fadd SDNode).
EVLBuilder convenience builder for EVL intrinsics, allows direct mapping from scalar instructions to EVL intrinsics.

This version works with the EVL intrinsic of the Region Vectorizer (available at https://github.com/cdl-saarland/rv/tree/feature/evl ).
Thank you for the helpful discussions!

Herald added a subscriber: mgorny. · View Herald TranscriptJan 18 2019, 7:48 AM

Harbormaster completed remote builds in B27047: Diff 182513.Jan 18 2019, 7:48 AM

Changes

FMA fusion! DAGCombiner lifted to work on EVL SDNodes as well as on regular SDNodes.
Native EVL SDNodes on ISel level.
Various fixes: gather/scatter cleanup, canonicalized reduction intrinsics, issues in TableGen's intrinsic generator code, ..

EVL development is now happening on https://github.com/cdl-saarland/llvm-evl

Harbormaster completed remote builds in B27192: Diff 183102.Jan 23 2019, 8:38 AM

I've opened a new RFC for a roadmap for vector predication (and a more up-to-date EVL prototype) - https://reviews.llvm.org/D57504 .

dmgreen added a subscriber: dmgreen.Mar 28 2019, 9:59 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2019, 9:59 AM

Herald added a subscriber: jdoerfert. · View Herald Transcript

rudkx removed a subscriber: rudkx.Mar 28 2019, 10:06 AM

vchuravy added a subscriber: vchuravy.May 5 2019, 7:57 PM

Hi @simoll,

I have a question about the meaning of the vlen operand so I want to double-check here.

The current draft of the RISC-V V-extension has a vsetvl instruction that sets the vector length register (vl). This instruction has an input operand called application vector length (AVL) in the spec (but for me makes more sense to call it the requested vector length, RVL) along with some other input operands that, I think, should not be too relevant now (the width of the element of the vector and a grouping register factor). The execution of this instruction computes the new value of VL (fulfilling a few rules described here: https://riscv.github.io/documents/riscv-v-spec/#_constraints_on_setting_vl). I like to call the computed value the granted vector length (GVL).

Back to the question: my assumption now is that the vlen operand in the proposed intrinsics, logically maps to the GVL (and not the RVL which would entail creating a strip-mining loop around to implement each intrinsic and does not seem very practical to me).

Is my understanding correct?

Thank you very much!

In D53613#1502616, @rogfer01 wrote:

Hi @simoll,

I have a question about the meaning of the vlen operand so I want to double-check here.

The current draft of the RISC-V V-extension has a vsetvl instruction that sets the vector length register (vl). This instruction has an input operand called application vector length (AVL) in the spec (but for me makes more sense to call it the requested vector length, RVL) along with some other input operands that, I think, should not be too relevant now (the width of the element of the vector and a grouping register factor). The execution of this instruction computes the new value of VL (fulfilling a few rules described here: https://riscv.github.io/documents/riscv-v-spec/#_constraints_on_setting_vl). I like to call the computed value the granted vector length (GVL).

Back to the question: my assumption now is that the vlen operand in the proposed intrinsics, logically maps to the GVL (and not the RVL which would entail creating a strip-mining loop around to implement each intrinsic and does not seem very practical to me).

Is my understanding correct?

It makes sense to interpret the vlen arg as GVL for RVV. The GVL could be derived from the AVL with an additional (target specific(?)) intrinsic. For targets where AVL==RVL this would be unnecessary. This came up during the llvm-dev discussion (http://lists.llvm.org/pipermail/llvm-dev/2019-February/129971.html).

wuiw added a subscriber: wuiw.Jul 1 2019, 8:21 PM

Herald added a reviewer: rengolin. · View Herald TranscriptJul 1 2019, 8:21 PM

Adding llvm-commits for wider audience

In D53613#1566133, @rengolin wrote:

Adding llvm-commits for wider audience

It's best to submit a new review.
Else no one sees the description as a mail.

In D53613#1566137, @lebedev.ri wrote:

It's best to submit a new review.
Else no one sees the description as a mail.

That's what D57504 is about. But all reviews need to have llvm-commits in them, so I just added.

When the discussion about which path to take (types or intrinsics) is over, this might end up as still the right path.

If the authors feel this is not the case anymore, they can close it, sure.

evandro added a subscriber: evandro.Jan 15 2020, 4:05 PM

Herald added subscribers: luismarques, s.egerton, lenary and 2 others. · View Herald TranscriptJan 15 2020, 4:05 PM

HsiangKai added a subscriber: HsiangKai.Feb 25 2020, 4:31 AM

vkmr added a subscriber: vkmr.Apr 28 2020, 9:10 AM

khchen added a subscriber: khchen.Oct 6 2020, 12:00 AM

Herald added a reviewer: jdoerfert. · View Herald TranscriptOct 6 2020, 12:00 AM

rkruppe mentioned this in D57504: RFC: Prototype & Roadmap for vector predication in LLVM.Dec 2 2020, 9:08 AM

Revision Contents

Path

Size

include/

llvm/

Bitcode/

LLVMBitCodes.h

3 lines

CodeGen/

ISDOpcodes.h

50 lines

SelectionDAG.h

14 lines

SelectionDAGNodes.h

217 lines

IR/

9 lines

84 lines

79 lines

280 lines

Target/

TargetSelectionDAG.td

71 lines

lib/

AsmParser/

LLLexer.cpp

3 lines

LLParser.cpp

21 lines

LLToken.h

3 lines

Bitcode/

Reader/

BitcodeReader.cpp

6 lines

Writer/

BitcodeWriter.cpp

6 lines

CodeGen/

SelectionDAG/

DAGCombiner.cpp

217 lines

LegalizeIntegerTypes.cpp

25 lines

LegalizeTypes.h

2 lines

SelectionDAG.cpp

338 lines

SelectionDAGBuilder.h

6 lines

SelectionDAGBuilder.cpp

381 lines

SelectionDAGDumper.cpp

59 lines

SelectionDAGISel.cpp

4 lines

IR/

6 lines

1 line

251 lines

83 lines

28 lines

Transforms/

Utils/

CodeExtractor.cpp

3 lines

test/

Bitcode/

attributes.ll

5 lines

Verifier/

evl_attribs.ll

13 lines

utils/

TableGen/

CodeGenIntrinsics.h

2 lines

CodeGenTarget.cpp

28 lines

IntrinsicEmitter.cpp

18 lines

Diff 183102

include/llvm/Bitcode/LLVMBitCodes.h

Show First 20 Lines • Show All 597 Lines • ▼ Show 20 Lines	enum AttributeKindCodes {
ATTR_KIND_WRITEONLY = 52,		ATTR_KIND_WRITEONLY = 52,
ATTR_KIND_SPECULATABLE = 53,		ATTR_KIND_SPECULATABLE = 53,
ATTR_KIND_STRICT_FP = 54,		ATTR_KIND_STRICT_FP = 54,
ATTR_KIND_SANITIZE_HWADDRESS = 55,		ATTR_KIND_SANITIZE_HWADDRESS = 55,
ATTR_KIND_NOCF_CHECK = 56,		ATTR_KIND_NOCF_CHECK = 56,
ATTR_KIND_OPT_FOR_FUZZING = 57,		ATTR_KIND_OPT_FOR_FUZZING = 57,
ATTR_KIND_SHADOWCALLSTACK = 58,		ATTR_KIND_SHADOWCALLSTACK = 58,
ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,		ATTR_KIND_SPECULATIVE_LOAD_HARDENING = 59,
		ATTR_KIND_MASK = 60,
		ATTR_KIND_VECTORLENGTH = 61,
		ATTR_KIND_PASSTHRU = 62,
};		};

enum ComdatSelectionKindCodes {		enum ComdatSelectionKindCodes {
COMDAT_SELECTION_KIND_ANY = 1,		COMDAT_SELECTION_KIND_ANY = 1,
COMDAT_SELECTION_KIND_EXACT_MATCH = 2,		COMDAT_SELECTION_KIND_EXACT_MATCH = 2,
COMDAT_SELECTION_KIND_LARGEST = 3,		COMDAT_SELECTION_KIND_LARGEST = 3,
COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,		COMDAT_SELECTION_KIND_NO_DUPLICATES = 4,
COMDAT_SELECTION_KIND_SAME_SIZE = 5,		COMDAT_SELECTION_KIND_SAME_SIZE = 5,
Show All 14 Lines

include/llvm/CodeGen/ISDOpcodes.h

Show First 20 Lines • Show All 192 Lines • ▼ Show 20 Lines	enum NodeType {
/// them all as its individual results. This nodes has exactly the same		/// them all as its individual results. This nodes has exactly the same
/// number of inputs and outputs. This node is useful for some pieces of the		/// number of inputs and outputs. This node is useful for some pieces of the
/// code generator that want to think about a single node with multiple		/// code generator that want to think about a single node with multiple
/// results, not multiple nodes.		/// results, not multiple nodes.
MERGE_VALUES,		MERGE_VALUES,

/// Simple integer binary arithmetic operators.		/// Simple integer binary arithmetic operators.
ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,		ADD, SUB, MUL, SDIV, UDIV, SREM, UREM,
		EVL_ADD, EVL_SUB, EVL_MUL, EVL_SDIV, EVL_UDIV, EVL_SREM, EVL_UREM,

/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing		/// SMUL_LOHI/UMUL_LOHI - Multiply two integers of type iN, producing
/// a signed/unsigned value of type i[2*N], and return the full value as		/// a signed/unsigned value of type i[2*N], and return the full value as
/// two results, each of type iN.		/// two results, each of type iN.
SMUL_LOHI, UMUL_LOHI,		SMUL_LOHI, UMUL_LOHI,

/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and		/// SDIVREM/UDIVREM - Divide two integers and produce both a quotient and
/// remainder result.		/// remainder result.
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	enum NodeType {
/// 2 integers with the same width and scale. SCALE represents the scale of		/// 2 integers with the same width and scale. SCALE represents the scale of
/// both operands as fixed point numbers. This SCALE parameter must be a		/// both operands as fixed point numbers. This SCALE parameter must be a
/// constant integer. A scale of zero is effectively performing		/// constant integer. A scale of zero is effectively performing
/// multiplication on 2 integers.		/// multiplication on 2 integers.
SMULFIX,		SMULFIX,

/// Simple binary floating point operators.		/// Simple binary floating point operators.
FADD, FSUB, FMUL, FDIV, FREM,		FADD, FSUB, FMUL, FDIV, FREM,
		EVL_FADD, EVL_FSUB, EVL_FMUL, EVL_FDIV, EVL_FREM,

/// Constrained versions of the binary floating point operators.		/// Constrained versions of the binary floating point operators.
/// These will be lowered to the simple operators before final selection.		/// These will be lowered to the simple operators before final selection.
/// They are used to limit optimizations while the DAG is being		/// They are used to limit optimizations while the DAG is being
/// optimized.		/// optimized.
STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,		STRICT_FADD, STRICT_FSUB, STRICT_FMUL, STRICT_FDIV, STRICT_FREM,
STRICT_FMA,		STRICT_FMA,

/// Constrained versions of libm-equivalent floating point intrinsics.		/// Constrained versions of libm-equivalent floating point intrinsics.
/// These will be lowered to the equivalent non-constrained pseudo-op		/// These will be lowered to the equivalent non-constrained pseudo-op
/// (or expanded to the equivalent library call) before final selection.		/// (or expanded to the equivalent library call) before final selection.
/// They are used to limit optimizations while the DAG is being optimized.		/// They are used to limit optimizations while the DAG is being optimized.
STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,		STRICT_FSQRT, STRICT_FPOW, STRICT_FPOWI, STRICT_FSIN, STRICT_FCOS,
STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,		STRICT_FEXP, STRICT_FEXP2, STRICT_FLOG, STRICT_FLOG10, STRICT_FLOG2,
STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,		STRICT_FRINT, STRICT_FNEARBYINT, STRICT_FMAXNUM, STRICT_FMINNUM,
STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,		STRICT_FCEIL, STRICT_FFLOOR, STRICT_FROUND, STRICT_FTRUNC,

/// FMA - Perform a * b + c with no intermediate rounding step.		/// FMA - Perform a * b + c with no intermediate rounding step.
FMA,		FMA,
		EVL_FMA,

/// FMAD - Perform a * b + c, while getting the same result as the		/// FMAD - Perform a * b + c, while getting the same result as the
/// separately rounded operations.		/// separately rounded operations.
FMAD,		FMAD,

/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This		/// FCOPYSIGN(X, Y) - Return the value of X with the sign of Y. NOTE: This
/// DAG node does not require that X and Y have the same type, just that		/// DAG node does not require that X and Y have the same type, just that
/// they are both floating point. X and the result must have the same type.		/// they are both floating point. X and the result must have the same type.
▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines	enum NodeType {
/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int		/// VEC1/VEC2. A VECTOR_SHUFFLE node also contains an array of constant int
/// values that indicate which value (or undef) each result element will		/// values that indicate which value (or undef) each result element will
/// get. These constant ints are accessible through the		/// get. These constant ints are accessible through the
/// ShuffleVectorSDNode class. This is quite similar to the Altivec		/// ShuffleVectorSDNode class. This is quite similar to the Altivec
/// 'vperm' instruction, except that the indices must be constants and are		/// 'vperm' instruction, except that the indices must be constants and are
/// in terms of the element size of VEC1/VEC2, not in terms of bytes.		/// in terms of the element size of VEC1/VEC2, not in terms of bytes.
VECTOR_SHUFFLE,		VECTOR_SHUFFLE,

		/// EVL_VSHIFT(VEC1, AMOUNT, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1. AMOUNT is an integer value. The returned vector is equivalent
		/// to VEC1 shifted by AMOUNT (RETURNED_VEC[idx] = VEC1[idx + AMOUNT]).
		EVL_VSHIFT,

		/// EVL_COMPRESS(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		EVL_COMPRESS,

		/// EVL_EXPAND(VEC1, MASK, VLEN) - Returns a vector, of the same type as
		/// VEC1.
		EVL_EXPAND,

/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a		/// SCALAR_TO_VECTOR(VAL) - This represents the operation of loading a
/// scalar value into element 0 of the resultant vector type. The top		/// scalar value into element 0 of the resultant vector type. The top
/// elements 1 to N-1 of the N-element vector are undefined. The type		/// elements 1 to N-1 of the N-element vector are undefined. The type
/// of the operand must match the vector element type, except when they		/// of the operand must match the vector element type, except when they
/// are integer types. In this case the operand is allowed to be wider		/// are integer types. In this case the operand is allowed to be wider
/// than the vector element type, and is implicitly truncated to it.		/// than the vector element type, and is implicitly truncated to it.
SCALAR_TO_VECTOR,		SCALAR_TO_VECTOR,

/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,		/// MULHU/MULHS - Multiply high - Multiply two integers of type iN,
/// producing an unsigned/signed value of type i[2*N], then return the top		/// producing an unsigned/signed value of type i[2*N], then return the top
/// part.		/// part.
MULHU, MULHS,		MULHU, MULHS,

/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned		/// [US]{MIN/MAX} - Binary minimum or maximum or signed or unsigned
/// integers.		/// integers.
SMIN, SMAX, UMIN, UMAX,		SMIN, SMAX, UMIN, UMAX,

/// Bitwise operators - logical and, logical or, logical xor.		/// Bitwise operators - logical and, logical or, logical xor.
AND, OR, XOR,		AND, OR, XOR,
		EVL_AND, EVL_OR, EVL_XOR,

/// ABS - Determine the unsigned absolute value of a signed integer value of		/// ABS - Determine the unsigned absolute value of a signed integer value of
/// the same bitwidth.		/// the same bitwidth.
/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow		/// Note: A value of INT_MIN will return INT_MIN, no saturation or overflow
/// is performed.		/// is performed.
ABS,		ABS,

/// Shift and rotation operations. After legalization, the type of the		/// Shift and rotation operations. After legalization, the type of the
/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization		/// shift amount is known to be TLI.getShiftAmountTy(). Before legalization
/// the shift amount can be any type, but care must be taken to ensure it is		/// the shift amount can be any type, but care must be taken to ensure it is
/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before		/// large enough. TLI.getShiftAmountTy() is i8 on some targets, but before
/// legalization, types like i1024 can occur and i8 doesn't have enough bits		/// legalization, types like i1024 can occur and i8 doesn't have enough bits
/// to represent the shift amount.		/// to represent the shift amount.
/// When the 1st operand is a vector, the shift amount must be in the same		/// When the 1st operand is a vector, the shift amount must be in the same
/// type. (TLI.getShiftAmountTy() will return the same type when the input		/// type. (TLI.getShiftAmountTy() will return the same type when the input
/// type is a vector.)		/// type is a vector.)
/// For rotates and funnel shifts, the shift amount is treated as an unsigned		/// For rotates and funnel shifts, the shift amount is treated as an unsigned
/// amount modulo the element size of the first operand.		/// amount modulo the element size of the first operand.
///		///
/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.		/// Funnel 'double' shifts take 3 operands, 2 inputs and the shift amount.
/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))		/// fshl(X,Y,Z): (X << (Z % BW)) \| (Y >> (BW - (Z % BW)))
/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))		/// fshr(X,Y,Z): (X << (BW - (Z % BW))) \| (Y >> (Z % BW))
SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,		SHL, SRA, SRL, ROTL, ROTR, FSHL, FSHR,
		EVL_SHL, EVL_SRA, EVL_SRL,

/// Byte Swap and Counting operators.		/// Byte Swap and Counting operators.
BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,		BSWAP, CTTZ, CTLZ, CTPOP, BITREVERSE,

/// Bit counting operators with an undefined result for zero inputs.		/// Bit counting operators with an undefined result for zero inputs.
CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,		CTTZ_ZERO_UNDEF, CTLZ_ZERO_UNDEF,

/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not		/// Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not
/// i1 then the high bits must conform to getBooleanContents.		/// i1 then the high bits must conform to getBooleanContents.
SELECT,		SELECT,

/// Select with a vector condition (op #0) and two vector operands (ops #1		/// Select with a vector condition (op #0) and two vector operands (ops #1
/// and #2), returning a vector result. All vectors have the same length.		/// and #2), returning a vector result. All vectors have the same length.
/// Much like the scalar select and setcc, each bit in the condition selects		/// Much like the scalar select and setcc, each bit in the condition selects
/// whether the corresponding result element is taken from op #1 or op #2.		/// whether the corresponding result element is taken from op #1 or op #2.
/// At first, the VSELECT condition is of vXi1 type. Later, targets may		/// At first, the VSELECT condition is of vXi1 type. Later, targets may
/// change the condition type in order to match the VSELECT node using a		/// change the condition type in order to match the VSELECT node using a
/// pattern. The condition follows the BooleanContent format of the target.		/// pattern. The condition follows the BooleanContent format of the target.
VSELECT,		VSELECT,
		EVL_SELECT,

		/// Select with an integer pivot (op #0) and two vector operands (ops #1
		/// and #2), returning a vector result. All vectors have the same length.
		/// Similar to the vector select, a comparison of the results element index
		/// with the integer pivot selects hether the corresponding result element
		/// is taken from op #1 or op #2.
		EVL_COMPOSE,

/// Select with condition operator - This selects between a true value and		/// Select with condition operator - This selects between a true value and
/// a false value (ops #2 and #3) based on the boolean result of comparing		/// a false value (ops #2 and #3) based on the boolean result of comparing
/// the lhs and rhs (ops #0 and #1) of a conditional expression with the		/// the lhs and rhs (ops #0 and #1) of a conditional expression with the
/// condition code in op #4, a CondCodeSDNode.		/// condition code in op #4, a CondCodeSDNode.
SELECT_CC,		SELECT_CC,

/// SetCC operator - This evaluates to a true value iff the condition is		/// SetCC operator - This evaluates to a true value iff the condition is
/// true. If the result value type is not i1 then the high bits conform		/// true. If the result value type is not i1 then the high bits conform
/// to getBooleanContents. The operands to this are the left and right		/// to getBooleanContents. The operands to this are the left and right
/// operands to compare (ops #0, and #1) and the condition code to compare		/// operands to compare (ops #0, and #1) and the condition code to compare
/// them with (op #2) as a CondCodeSDNode. If the operands are vector types		/// them with (op #2) as a CondCodeSDNode. If the operands are vector types
/// then the result type must also be a vector type.		/// then the result type must also be a vector type.
SETCC,		SETCC,
		EVL_SETCC,

/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but		/// Like SetCC, ops #0 and #1 are the LHS and RHS operands to compare, but
/// op #2 is a boolean indicating if there is an incoming carry. This		/// op #2 is a boolean indicating if there is an incoming carry. This
/// operator checks the result of "LHS - RHS - Carry", and can be used to		/// operator checks the result of "LHS - RHS - Carry", and can be used to
/// compare two wide integers:		/// compare two wide integers:
/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).		/// (setcccarry lhshi rhshi (subcarry lhslo rhslo) cc).
/// Only valid for integers.		/// Only valid for integers.
SETCCCARRY,		SETCCCARRY,
▲ Show 20 Lines • Show All 127 Lines • ▼ Show 20 Lines	enum NodeType {
/// form a semi-softened interface for dealing with f16 (as an i16), which		/// form a semi-softened interface for dealing with f16 (as an i16), which
/// is often a storage-only type but has native conversions.		/// is often a storage-only type but has native conversions.
FP16_TO_FP, FP_TO_FP16,		FP16_TO_FP, FP_TO_FP16,

/// Perform various unary floating-point operations inspired by libm.		/// Perform various unary floating-point operations inspired by libm.
FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,		FNEG, FABS, FSQRT, FCBRT, FSIN, FCOS, FPOWI, FPOW,
FLOG, FLOG2, FLOG10, FEXP, FEXP2,		FLOG, FLOG2, FLOG10, FEXP, FEXP2,
FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,		FCEIL, FTRUNC, FRINT, FNEARBYINT, FROUND, FFLOOR,
		EVL_FNEG,
/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two		/// FMINNUM/FMAXNUM - Perform floating-point minimum or maximum on two
/// values.		/// values.
//		//
/// In the case where a single input is a NaN (either signaling or quiet),		/// In the case where a single input is a NaN (either signaling or quiet),
/// the non-NaN input is returned.		/// the non-NaN input is returned.
///		///
/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.		/// The return value of (FMINNUM 0.0, -0.0) could be either 0.0 or -0.0.
FMINNUM, FMAXNUM,		FMINNUM, FMAXNUM,
▲ Show 20 Lines • Show All 229 Lines • ▼ Show 20 Lines	enum NodeType {

// Masked load and store - consecutive vector load and store operations		// Masked load and store - consecutive vector load and store operations
// with additional mask operand that prevents memory accesses to the		// with additional mask operand that prevents memory accesses to the
// masked-off lanes.		// masked-off lanes.
//		//
// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)		// Val, OutChain = MLOAD(BasePtr, Mask, PassThru)
// OutChain = MSTORE(Value, BasePtr, Mask)		// OutChain = MSTORE(Value, BasePtr, Mask)
MLOAD, MSTORE,		MLOAD, MSTORE,
		EVL_LOAD, EVL_STORE,

// Masked gather and scatter - load and store operations for a vector of		// Masked gather and scatter - load and store operations for a vector of
// random addresses with additional mask operand that prevents memory		// random addresses with additional mask operand that prevents memory
// accesses to the masked-off lanes.		// accesses to the masked-off lanes.
//		//
// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)		// Val, OutChain = GATHER(InChain, PassThru, Mask, BasePtr, Index, Scale)
// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)		// OutChain = SCATTER(InChain, Value, Mask, BasePtr, Index, Scale)
//		//
// The Index operand can have more vector elements than the other operands		// The Index operand can have more vector elements than the other operands
// due to type legalization. The extra elements are ignored.		// due to type legalization. The extra elements are ignored.
MGATHER, MSCATTER,		MGATHER, MSCATTER,
		EVL_GATHER, EVL_SCATTER,

/// This corresponds to the llvm.lifetime.* intrinsics. The first operand		/// This corresponds to the llvm.lifetime.* intrinsics. The first operand
/// is the chain and the second operand is the alloca pointer.		/// is the chain and the second operand is the alloca pointer.
LIFETIME_START, LIFETIME_END,		LIFETIME_START, LIFETIME_END,

/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the		/// GC_TRANSITION_START/GC_TRANSITION_END - These operators mark the
/// beginning and end of GC transition sequence, and carry arbitrary		/// beginning and end of GC transition sequence, and carry arbitrary
/// information that target might need for lowering. The first operand is		/// information that target might need for lowering. The first operand is
Show All 15 Lines	enum NodeType {
/// operand is an initial scalar accumulator value, and the second operand		/// operand is an initial scalar accumulator value, and the second operand
/// is the vector to reduce.		/// is the vector to reduce.
VECREDUCE_STRICT_FADD, VECREDUCE_STRICT_FMUL,		VECREDUCE_STRICT_FADD, VECREDUCE_STRICT_FMUL,
/// These reductions are non-strict, and have a single vector operand.		/// These reductions are non-strict, and have a single vector operand.
VECREDUCE_FADD, VECREDUCE_FMUL,		VECREDUCE_FADD, VECREDUCE_FMUL,
VECREDUCE_ADD, VECREDUCE_MUL,		VECREDUCE_ADD, VECREDUCE_MUL,
VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,		VECREDUCE_AND, VECREDUCE_OR, VECREDUCE_XOR,
VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,		VECREDUCE_SMAX, VECREDUCE_SMIN, VECREDUCE_UMAX, VECREDUCE_UMIN,

		EVL_REDUCE_FADD, EVL_REDUCE_FMUL,
		EVL_REDUCE_ADD, EVL_REDUCE_MUL,
		EVL_REDUCE_AND, EVL_REDUCE_OR, EVL_REDUCE_XOR,
		EVL_REDUCE_SMAX, EVL_REDUCE_SMIN, EVL_REDUCE_UMAX, EVL_REDUCE_UMIN,

/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
VECREDUCE_FMAX, VECREDUCE_FMIN,		VECREDUCE_FMAX, VECREDUCE_FMIN,
		EVL_REDUCE_FMAX, EVL_REDUCE_FMIN,

/// BUILTIN_OP_END - This must be the last enum value in this list.		/// BUILTIN_OP_END - This must be the last enum value in this list.
/// The target-specific pre-isel opcode values start here.		/// The target-specific pre-isel opcode values start here.
BUILTIN_OP_END		BUILTIN_OP_END
};		};

/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations		/// FIRST_TARGET_MEMORY_OPCODE - Target-specific pre-isel operations
/// which do not reference a specific memory location should be less than		/// which do not reference a specific memory location should be less than
▲ Show 20 Lines • Show All 144 Lines • ▼ Show 20 Lines
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCOrOperation(CondCode Op1, CondCode Op2, bool isInteger);

/// Return the result of a logical AND between different comparisons of		/// Return the result of a logical AND between different comparisons of
/// identical values: ((X op1 Y) & (X op2 Y)). This function returns		/// identical values: ((X op1 Y) & (X op2 Y)). This function returns
/// SETCC_INVALID if it is not possible to represent the resultant comparison.		/// SETCC_INVALID if it is not possible to represent the resultant comparison.
CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);		CondCode getSetCCAndOperation(CondCode Op1, CondCode Op2, bool isInteger);

		/// Return the mask operand of this EVL SDNode.
		/// Otw, return -1.
		int GetMaskPosEVL(unsigned OpCode);

		/// Return the vector length operand of this EVL SDNode.
		/// Otw, return -1.
		int GetVectorLengthPosEVL(unsigned OpCode);

		/// Translate this EVL OpCode to a native instruction OpCode.
		unsigned GetFunctionOpCodeForEVL(unsigned EVLOpCode);

		unsigned GetEVLForFunctionOpCode(unsigned OpCode);

} // end llvm::ISD namespace		} // end llvm::ISD namespace

} // end llvm namespace		} // end llvm namespace

#endif		#endif

include/llvm/CodeGen/SelectionDAG.h

Show First 20 Lines • Show All 1,078 Lines • ▼ Show 20 Lines	getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val, SDValue Ptr,
MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,		MachineMemOperand::Flags MMOFlags = MachineMemOperand::MONone,
const AAMDNodes &AAInfo = AAMDNodes());		const AAMDNodes &AAInfo = AAMDNodes());
SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getTruncStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, EVT SVT, MachineMemOperand *MMO);		SDValue Ptr, EVT SVT, MachineMemOperand *MMO);
SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,		SDValue getIndexedStore(SDValue OrigStore, const SDLoc &dl, SDValue Base,
SDValue Offset, ISD::MemIndexedMode AM);		SDValue Offset, ISD::MemIndexedMode AM);

/// Returns sum of the base pointer and offset.		/// Returns sum of the base pointer and offset.
		SDValue getLoadEVL(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
		SDValue Mask, SDValue VLen, EVT MemVT,
		MachineMemOperand *MMO, ISD::LoadExtType);

		SDValue getStoreEVL(SDValue Chain, const SDLoc &dl, SDValue Val,
		SDValue Ptr, SDValue Mask, SDValue VLen,
		EVT MemVT, MachineMemOperand *MMO,
		bool IsTruncating = false);
		SDValue getGatherEVL(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO);
		SDValue getScatterEVL(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops, MachineMemOperand *MMO);

		/// Returns sum of the base pointer and offset.
SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);		SDValue getMemBasePlusOffset(SDValue Base, unsigned Offset, const SDLoc &DL);

SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,		SDValue getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain, SDValue Ptr,
SDValue Mask, SDValue Src0, EVT MemVT,		SDValue Mask, SDValue Src0, EVT MemVT,
MachineMemOperand *MMO, ISD::LoadExtType,		MachineMemOperand *MMO, ISD::LoadExtType,
bool IsExpanding = false);		bool IsExpanding = false);
SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,		SDValue getMaskedStore(SDValue Chain, const SDLoc &dl, SDValue Val,
SDValue Ptr, SDValue Mask, EVT MemVT,		SDValue Ptr, SDValue Mask, EVT MemVT,
▲ Show 20 Lines • Show All 600 Lines • Show Last 20 Lines

include/llvm/CodeGen/SelectionDAGNodes.h

Show First 20 Lines • Show All 527 Lines • ▼ Show 20 Lines	class LSBaseSDNodeBitfields {

uint16_t AddressingMode : 3; // enum ISD::MemIndexedMode		uint16_t AddressingMode : 3; // enum ISD::MemIndexedMode
};		};
enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };		enum { NumLSBaseSDNodeBits = NumMemSDNodeBits + 3 };

class LoadSDNodeBitfields {		class LoadSDNodeBitfields {
friend class LoadSDNode;		friend class LoadSDNode;
friend class MaskedLoadSDNode;		friend class MaskedLoadSDNode;
		friend class EVLLoadSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t ExtTy : 2; // enum ISD::LoadExtType		uint16_t ExtTy : 2; // enum ISD::LoadExtType
uint16_t IsExpanding : 1;		uint16_t IsExpanding : 1;
};		};

class StoreSDNodeBitfields {		class StoreSDNodeBitfields {
friend class StoreSDNode;		friend class StoreSDNode;
friend class MaskedStoreSDNode;		friend class MaskedStoreSDNode;
		friend class EVLStoreSDNode;

uint16_t : NumLSBaseSDNodeBits;		uint16_t : NumLSBaseSDNodeBits;

uint16_t IsTruncating : 1;		uint16_t IsTruncating : 1;
uint16_t IsCompressing : 1;		uint16_t IsCompressing : 1;
};		};

union {		union {
▲ Show 20 Lines • Show All 121 Lines • ▼ Show 20 Lines	switch (NodeType) {
case ISD::STRICT_FCEIL:		case ISD::STRICT_FCEIL:
case ISD::STRICT_FFLOOR:		case ISD::STRICT_FFLOOR:
case ISD::STRICT_FROUND:		case ISD::STRICT_FROUND:
case ISD::STRICT_FTRUNC:		case ISD::STRICT_FTRUNC:
return true;		return true;
}		}
}		}

		/// Test whether this is an Explicit Vector Length node.
		bool isEVL() const {
		switch (NodeType) {
		default:
		return false;
		case ISD::EVL_LOAD:
		case ISD::EVL_STORE:
		case ISD::EVL_GATHER:
		case ISD::EVL_SCATTER:

		case ISD::EVL_FNEG:

		case ISD::EVL_FADD:
		case ISD::EVL_FMUL:
		case ISD::EVL_FSUB:
		case ISD::EVL_FDIV:
		case ISD::EVL_FREM:

		case ISD::EVL_FMA:

		case ISD::EVL_ADD:
		case ISD::EVL_MUL:
		case ISD::EVL_SUB:
		case ISD::EVL_SRA:
		case ISD::EVL_SRL:
		case ISD::EVL_SHL:
		case ISD::EVL_UDIV:
		case ISD::EVL_SDIV:
		case ISD::EVL_UREM:
		case ISD::EVL_SREM:

		case ISD::EVL_EXPAND:
		case ISD::EVL_COMPRESS:
		case ISD::EVL_VSHIFT:
		case ISD::EVL_SETCC:
		case ISD::EVL_COMPOSE:

		case ISD::EVL_AND:
		case ISD::EVL_XOR:
		case ISD::EVL_OR:

		case ISD::EVL_REDUCE_ADD:
		case ISD::EVL_REDUCE_SMIN:
		case ISD::EVL_REDUCE_SMAX:
		case ISD::EVL_REDUCE_UMIN:
		case ISD::EVL_REDUCE_UMAX:

		case ISD::EVL_REDUCE_MUL:
		case ISD::EVL_REDUCE_AND:
		case ISD::EVL_REDUCE_OR:
		case ISD::EVL_REDUCE_FADD:
		case ISD::EVL_REDUCE_FMUL:
		case ISD::EVL_REDUCE_FMIN:
		case ISD::EVL_REDUCE_FMAX:

		return true;
		}
		}


/// Test if this node has a post-isel opcode, directly		/// Test if this node has a post-isel opcode, directly
/// corresponding to a MachineInstr opcode.		/// corresponding to a MachineInstr opcode.
bool isMachineOpcode() const { return NodeType < 0; }		bool isMachineOpcode() const { return NodeType < 0; }

/// This may only be called if isMachineOpcode returns		/// This may only be called if isMachineOpcode returns
/// true. It returns the MachineInstr opcode value that the node's opcode		/// true. It returns the MachineInstr opcode value that the node's opcode
/// corresponds to.		/// corresponds to.
unsigned getMachineOpcode() const {		unsigned getMachineOpcode() const {
▲ Show 20 Lines • Show All 671 Lines • ▼ Show 20 Lines	return N->getOpcode() == ISD::LOAD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FADD \|\|
N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|		N->getOpcode() == ISD::ATOMIC_LOAD_FSUB \|\|
N->getOpcode() == ISD::ATOMIC_LOAD \|\|		N->getOpcode() == ISD::ATOMIC_LOAD \|\|
N->getOpcode() == ISD::ATOMIC_STORE \|\|		N->getOpcode() == ISD::ATOMIC_STORE \|\|
N->getOpcode() == ISD::MLOAD \|\|		N->getOpcode() == ISD::MLOAD \|\|
N->getOpcode() == ISD::MSTORE \|\|		N->getOpcode() == ISD::MSTORE \|\|
N->getOpcode() == ISD::MGATHER \|\|		N->getOpcode() == ISD::MGATHER \|\|
N->getOpcode() == ISD::MSCATTER \|\|		N->getOpcode() == ISD::MSCATTER \|\|
		N->getOpcode() == ISD::EVL_LOAD \|\|
		N->getOpcode() == ISD::EVL_STORE \|\|
		N->getOpcode() == ISD::EVL_GATHER \|\|
		N->getOpcode() == ISD::EVL_SCATTER \|\|
N->isMemIntrinsic() \|\|		N->isMemIntrinsic() \|\|
N->isTargetMemoryOpcode();		N->isTargetMemoryOpcode();
}		}
};		};

/// This is an SDNode representing atomic operations.		/// This is an SDNode representing atomic operations.
class AtomicSDNode : public MemSDNode {		class AtomicSDNode : public MemSDNode {
public:		public:
▲ Show 20 Lines • Show All 757 Lines • ▼ Show 20 Lines	public:
const SDValue &getOffset() const { return getOperand(3); }		const SDValue &getOffset() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::STORE;		return N->getOpcode() == ISD::STORE;
}		}
};		};

/// This base class is used to represent MLOAD and MSTORE nodes		/// This base class is used to represent MLOAD and MSTORE nodes
		class EVLLoadStoreSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		EVLLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}

		// EVLLoadSDNode (Chain, ptr, mask, VLen)
		// EVLStoreSDNode (Chain, data, ptr, mask, VLen)
		// Mask is a vector of i1 elements, Vlen is i32
		const SDValue &getBasePtr() const {
		return getOperand(getOpcode() == ISD::EVL_LOAD ? 1 : 2);
		}
		const SDValue &getMask() const {
		return getOperand(getOpcode() == ISD::EVL_LOAD ? 2 : 3);
		}
		const SDValue &getVectorLength() const {
		return getOperand(getOpcode() == ISD::EVL_LOAD ? 3 : 4);
		}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_LOAD \|\|
		N->getOpcode() == ISD::EVL_STORE;
		}
		};

		/// This class is used to represent an MLOAD node
		class EVLLoadSDNode : public EVLLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		EVLLoadSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		ISD::LoadExtType ETy, EVT MemVT,
		MachineMemOperand *MMO)
		: EVLLoadStoreSDNode(ISD::EVL_LOAD, Order, dl, VTs, MemVT, MMO) {
		LoadSDNodeBits.ExtTy = ETy;
		LoadSDNodeBits.IsExpanding = false;
		}

		ISD::LoadExtType getExtensionType() const {
		return static_cast<ISD::LoadExtType>(LoadSDNodeBits.ExtTy);
		}

		const SDValue &getBasePtr() const { return getOperand(1); }
		const SDValue &getMask() const { return getOperand(2); }
		const SDValue &getVectorLength() const { return getOperand(3); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_LOAD;
		}
		bool isExpandingLoad() const { return LoadSDNodeBits.IsExpanding; }
		};

		/// This class is used to represent an MSTORE node
		class EVLStoreSDNode : public EVLLoadStoreSDNode {
		public:
		friend class SelectionDAG;

		EVLStoreSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		bool isTrunc, EVT MemVT,
		MachineMemOperand *MMO)
		: EVLLoadStoreSDNode(ISD::EVL_STORE, Order, dl, VTs, MemVT, MMO) {
		StoreSDNodeBits.IsTruncating = isTrunc;
		StoreSDNodeBits.IsCompressing = false;
		}

		/// Return true if the op does a truncation before store.
		/// For integers this is the same as doing a TRUNCATE and storing the result.
		/// For floats, it is the same as doing an FP_ROUND and storing the result.
		bool isTruncatingStore() const { return StoreSDNodeBits.IsTruncating; }

		/// Returns true if the op does a compression to the vector before storing.
		/// The node contiguously stores the active elements (integers or floats)
		/// in src (those with their respective bit set in writemask k) to unaligned
		/// memory at base_addr.
		bool isCompressingStore() const { return StoreSDNodeBits.IsCompressing; }

		const SDValue &getValue() const { return getOperand(1); }
		const SDValue &getBasePtr() const { return getOperand(2); }
		const SDValue &getMask() const { return getOperand(3); }
		const SDValue &getVectorLength() const { return getOperand(4); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_STORE;
		}
		};

		/// This base class is used to represent MLOAD and MSTORE nodes
class MaskedLoadStoreSDNode : public MemSDNode {		class MaskedLoadStoreSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedLoadStoreSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
MachineMemOperand *MMO)		MachineMemOperand *MMO)
: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}
▲ Show 20 Lines • Show All 71 Lines • ▼ Show 20 Lines	public:
const SDValue &getMask() const { return getOperand(3); }		const SDValue &getMask() const { return getOperand(3); }

static bool classof(const SDNode *N) {		static bool classof(const SDNode *N) {
return N->getOpcode() == ISD::MSTORE;		return N->getOpcode() == ISD::MSTORE;
}		}
};		};

/// This is a base class used to represent		/// This is a base class used to represent
		/// EVL_GATHER and EVL_SCATTER nodes
		///
		class EVLGatherScatterSDNode : public MemSDNode {
		public:
		friend class SelectionDAG;

		EVLGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
		MachineMemOperand *MMO)
		: MemSDNode(NodeTy, Order, dl, VTs, MemVT, MMO) {}

		// In the both nodes address is Op1, mask is Op2:
		// EVLGatherSDNode (Chain, base, index, scale, mask, vlen)
		// EVLScatterSDNode (Chain, value, base, index, sckae, mask, vlen)
		// Mask is a vector of i1 elements
		const SDValue &getBasePtr() const { return getOperand((getOpcode() == ISD::EVL_GATHER) ? 1 : 2); }
		const SDValue &getIndex() const { return getOperand((getOpcode() == ISD::EVL_GATHER) ? 2 : 3); }
		const SDValue &getScale() const { return getOperand((getOpcode() == ISD::EVL_GATHER) ? 3 : 4); }
		const SDValue &getMask() const { return getOperand((getOpcode() == ISD::EVL_GATHER) ? 4 : 5); }
		const SDValue &getVectorLength() const { return getOperand((getOpcode() == ISD::EVL_GATHER) ? 5 : 6); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_GATHER \|\|
		N->getOpcode() == ISD::EVL_SCATTER;
		}
		};

		/// This class is used to represent an EVL_GATHER node
		///
		class EVLGatherSDNode : public EVLGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		EVLGatherSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO)
		: EVLGatherScatterSDNode(ISD::EVL_GATHER, Order, dl, VTs, MemVT, MMO) {}

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_GATHER;
		}
		};

		/// This class is used to represent an EVL_SCATTER node
		///
		class EVLScatterSDNode : public EVLGatherScatterSDNode {
		public:
		friend class SelectionDAG;

		EVLScatterSDNode(unsigned Order, const DebugLoc &dl, SDVTList VTs,
		EVT MemVT, MachineMemOperand *MMO)
		: EVLGatherScatterSDNode(ISD::EVL_SCATTER, Order, dl, VTs, MemVT, MMO) {}

		const SDValue &getValue() const { return getOperand(1); }

		static bool classof(const SDNode *N) {
		return N->getOpcode() == ISD::EVL_SCATTER;
		}
		};


		/// This is a base class used to represent
/// MGATHER and MSCATTER nodes		/// MGATHER and MSCATTER nodes
///		///
class MaskedGatherScatterSDNode : public MemSDNode {		class MaskedGatherScatterSDNode : public MemSDNode {
public:		public:
friend class SelectionDAG;		friend class SelectionDAG;

MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,		MaskedGatherScatterSDNode(ISD::NodeType NodeTy, unsigned Order,
const DebugLoc &dl, SDVTList VTs, EVT MemVT,		const DebugLoc &dl, SDVTList VTs, EVT MemVT,
▲ Show 20 Lines • Show All 276 Lines • Show Last 20 Lines

include/llvm/IR/Attributes.td

	Show First 20 Lines • Show All 124 Lines • ▼ Show 20 Lines
	def ReadNone : EnumAttr<"readnone">;			def ReadNone : EnumAttr<"readnone">;

	/// Function only reads from memory.			/// Function only reads from memory.
	def ReadOnly : EnumAttr<"readonly">;			def ReadOnly : EnumAttr<"readonly">;

	/// Return value is always equal to this argument.			/// Return value is always equal to this argument.
	def Returned : EnumAttr<"returned">;			def Returned : EnumAttr<"returned">;

				/// Return value that is equal to this argument on enabled lanes (mask).
				def Passthru : EnumAttr<"passthru">;

				/// Mask argument that applies to this function.
				def Mask : EnumAttr<"mask">;

				/// Dynamic Vector Length argument of this function.
				def VectorLength : EnumAttr<"vlen">;

	/// Function can return twice.			/// Function can return twice.
	def ReturnsTwice : EnumAttr<"returns_twice">;			def ReturnsTwice : EnumAttr<"returns_twice">;

	/// Safe Stack protection.			/// Safe Stack protection.
	def SafeStack : EnumAttr<"safestack">;			def SafeStack : EnumAttr<"safestack">;

	/// Shadow Call Stack protection.			/// Shadow Call Stack protection.
	def ShadowCallStack : EnumAttr<"shadowcallstack">;			def ShadowCallStack : EnumAttr<"shadowcallstack">;
	▲ Show 20 Lines • Show All 108 Lines • Show Last 20 Lines

include/llvm/IR/EVLBuilder.h

This file was added.

				#ifndef LLVM_IR_EVLBUILDER_H
				#define LLVM_IR_EVLBUILDER_H

				#include <llvm/IR/IRBuilder.h>
				#include <llvm/IR/Value.h>
				#include <llvm/IR/Instruction.h>

				namespace llvm {

				enum class EVLTypeToken : int8_t {
				Scalar = 1, // scalar operand type
				Vector = 2, // vectorized operand type
				Mask = 3 // vector mask type
				};

				using TypeTokenVec = SmallVector<EVLTypeToken, 4>;
				using ShortTypeVec = SmallVector<Type*, 4>;
				using ShortValueVec = SmallVector<Value*, 4>;

				struct
				EVLIntrinsicDesc {
				Intrinsic::ID ID; // LLVM Intrinsic ID.
				TypeTokenVec typeTokens; // Type Parmeters for the LLVM Intrinsic.
				int MaskPos; // Parameter index of the Mask parameter.
				int EVLPos; // Parameter index of the EVL parameter.
				};

				using ValArray = ArrayRef<Value*>;

				class EVLBuilder {
				IRBuilder<> & Builder;
				// Explicit mask parameter
				Value * Mask;
				// Explicit vector length parameter
				Value * ExplicitVectorLength;
				// Compile-time vector length
				int StaticVectorLength;

				// get a vlaid mask/evl argument for the current predication contet
				Value& GetMaskForType(VectorType & VecTy);
				Value& GetEVLForType(VectorType & VecTy);

				public:
				EVLBuilder(IRBuilder<> & _builder)
				: Builder(_builder)
				, Mask(nullptr)
				, ExplicitVectorLength(nullptr)
				, StaticVectorLength(-1)
				{}

				Module & getModule() const;

				// The cannonical vector type for this \p ElementTy
				VectorType& getVectorType(Type &ElementTy);

				// Predication context tracker
				EVLBuilder& setMask(Value * _Mask) { Mask = _Mask; return *this; }
				EVLBuilder& setEVL(Value * _ExplicitVectorLength) { ExplicitVectorLength = _ExplicitVectorLength; return *this; }
				EVLBuilder& setStaticVL(int VLen) { StaticVectorLength = VLen; return *this; }

				EVLIntrinsicDesc GetEVLIntrinsicDesc(unsigned OC);

				// Create a map-vectorized copy of the instruction \p Inst with the underlying IRBuilder instance.
				// This operation may return nullptr if the instruction could not be vectorized.
				Value* CreateVectorCopy(Instruction & Inst, ValArray VecOpArray);

				Value& CreateGEP(ValArray VecOpArray);

				Value& CreateFAdd(ValArray VecOpArray);
				Value& CreateFDiv(ValArray VecOpArray);
				Value& CreateFMul(ValArray VecOpArray);
				Value& CreateFSub(ValArray VecOpArray);

				// Memory
				Value& CreateContiguousStore(Value & Val, Value & Pointer);
				Value& CreateContiguousLoad(Value & Pointer);
				Value& CreateScatter(Value & Val, Value & PointerVec);
				Value& CreateGather(Value & PointerVec);
				};


				} // namespace llvm

				#endif // LLVM_IR_EVLBUILDER_H

include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines	static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::dbg_label;		return I->getIntrinsicID() == Intrinsic::dbg_label;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
/// @}		/// @}
};		};

		class EVLIntrinsic : public IntrinsicInst {
		public:

		bool isUnaryOp() const;
		bool isBinaryOp() const;
		bool isTernaryOp() const;

		CmpInst::Predicate getCmpPredicate() const;

		Value* GetMask() const;
		Value* GetVectorLength() const;

		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		switch (I->getIntrinsicID()) {

		case Intrinsic::evl_cmp:

		case Intrinsic::evl_and:
		case Intrinsic::evl_or:
		case Intrinsic::evl_xor:
		case Intrinsic::evl_ashr:
		case Intrinsic::evl_lshr:
		case Intrinsic::evl_shl:

		case Intrinsic::evl_select:
		case Intrinsic::evl_compose:
		case Intrinsic::evl_compress:
		case Intrinsic::evl_expand:
		case Intrinsic::evl_vshift:

		case Intrinsic::evl_load:
		case Intrinsic::evl_store:
		case Intrinsic::evl_gather:
		case Intrinsic::evl_scatter:

		case Intrinsic::evl_fneg:

		case Intrinsic::evl_fadd:
		case Intrinsic::evl_fsub:
		case Intrinsic::evl_fmul:
		case Intrinsic::evl_fdiv:
		case Intrinsic::evl_frem:

		case Intrinsic::evl_fma:

		case Intrinsic::evl_add:
		case Intrinsic::evl_sub:
		case Intrinsic::evl_mul:
		case Intrinsic::evl_udiv:
		case Intrinsic::evl_sdiv:
		case Intrinsic::evl_urem:
		case Intrinsic::evl_srem:

		case Intrinsic::evl_reduce_add:
		case Intrinsic::evl_reduce_mul:
		case Intrinsic::evl_reduce_umin:
		case Intrinsic::evl_reduce_umax:
		case Intrinsic::evl_reduce_smin:
		case Intrinsic::evl_reduce_smax:

		case Intrinsic::evl_reduce_and:
		case Intrinsic::evl_reduce_or:
		case Intrinsic::evl_reduce_xor:

		case Intrinsic::evl_reduce_fadd:
		case Intrinsic::evl_reduce_fmul:
		case Intrinsic::evl_reduce_fmin:
		case Intrinsic::evl_reduce_fmax:
		return true;

		default: return false;
		}
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}
		};

/// This is the common base class for constrained floating point intrinsics.		/// This is the common base class for constrained floating point intrinsics.
class ConstrainedFPIntrinsic : public IntrinsicInst {		class ConstrainedFPIntrinsic : public IntrinsicInst {
public:		public:
enum RoundingMode {		enum RoundingMode {
rmInvalid,		rmInvalid,
rmDynamic,		rmDynamic,
rmToNearest,		rmToNearest,
rmDownward,		rmDownward,
▲ Show 20 Lines • Show All 570 Lines • Show Last 20 Lines

include/llvm/IR/Intrinsics.td

	Show First 20 Lines • Show All 81 Lines • ▼ Show 20 Lines
	}			}

	// ReadNone - The specified argument pointer is not dereferenced by the			// ReadNone - The specified argument pointer is not dereferenced by the
	// intrinsic.			// intrinsic.
	class ReadNone<int argNo> : IntrinsicProperty {			class ReadNone<int argNo> : IntrinsicProperty {
	int ArgNo = argNo;			int ArgNo = argNo;
	}			}

				// VectorLength - The specified argument is the Dynamic Vector Length of the
				// operation.
				class VectorLength<int argNo> : IntrinsicProperty {
				int ArgNo = argNo;
				}

				// Mask - The specified argument contains the per-lane mask of this
				// intrinsic. Inputs on masked-out lanes must not affect the result of this
				// intrinsic (except for the Passthru argument).
				class Mask<int argNo> : IntrinsicProperty {
				int ArgNo = argNo;
				}
				// Passthru - The specified argument contains the per-lane return value
				// for this vector intrinsic where the mask is false.
				// (requires the Mask attribute in the same function)
				class Passthru<int argNo> : IntrinsicProperty {
				int ArgNo = argNo;
				}

	def IntrNoReturn : IntrinsicProperty;			def IntrNoReturn : IntrinsicProperty;

	// IntrCold - Calls to this intrinsic are cold.			// IntrCold - Calls to this intrinsic are cold.
	// Parallels the cold attribute on LLVM IR functions.			// Parallels the cold attribute on LLVM IR functions.
	def IntrCold : IntrinsicProperty;			def IntrCold : IntrinsicProperty;

	// IntrNoduplicate - Calls to this intrinsic cannot be duplicated.			// IntrNoduplicate - Calls to this intrinsic cannot be duplicated.
	// Parallels the noduplicate attribute on LLVM IR functions.			// Parallels the noduplicate attribute on LLVM IR functions.
	▲ Show 20 Lines • Show All 892 Lines • ▼ Show 20 Lines
	// Clear cache intrinsic, default to ignore (ie. emit nothing)			// Clear cache intrinsic, default to ignore (ie. emit nothing)
	// maps to void __clear_cache() on supporting platforms			// maps to void __clear_cache() on supporting platforms
	def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],			def int_clear_cache : Intrinsic<[], [llvm_ptr_ty, llvm_ptr_ty],
	[], "llvm.clear_cache">;			[], "llvm.clear_cache">;

	// Intrinsic to detect whether its argument is a constant.			// Intrinsic to detect whether its argument is a constant.
	def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;			def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem], "llvm.is.constant">;

				//===---------------- Masked/Explicit Vector Length Intrinsics --------------===//

				// Memory Intrinsics
				def int_evl_store : Intrinsic<[],
				rkruppeUnsubmitted Done Reply Inline Actions One of these i32 arguments it the `dynamic_vl` argument, what's the other? Alignment? rkruppe: One of these i32 arguments it the `dynamic_vl` argument, what's the other? Alignment?
				simollAuthorUnsubmitted Done Reply Inline Actions Yep. That's alignment as in llvm.masked.store. simoll: Yep. That's alignment as in [llvm.masked.store](https://llvm.org/docs/LangRef.html#llvm-masked…
				rkruppeUnsubmitted Done Reply Inline Actions Ah, I forgot that the existing masked intrinsics also take the alignment as a normal parameter. I think new intrinsics shouldn't follow that precedent, nowadays we have the `align` attribute for call sites (already used e.g. by `llvm.memcpy`), so the alignment information can be supplied like this: call void @llvm.evl.store(<4 x i32> %v, <4 x i32>* align 16 %p, ...) This ensures that the alignment is a compile time constant, and during instruction selection and later stages, it should be stored in the MachineMemOperand, not be an extra operand (that's the part that caused me trouble in my experiments). rkruppe: Ah, I forgot that the existing masked intrinsics also take the alignment as a normal parameter.
				simollAuthorUnsubmitted Done Reply Inline Actions Ok. I'll drop the alignment arguments in the next update. simoll: Ok. I'll drop the alignment arguments in the next update.
				[ llvm_anyvector_ty,
				LLVMAnyPointerType<LLVMMatchType<0>>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrArgMemOnly, Mask<2>, VectorLength<3> ]>;

				def int_evl_load : Intrinsic<[ llvm_anyvector_ty],
				[ LLVMAnyPointerType<LLVMMatchType<0>>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrReadMem, IntrArgMemOnly, Mask<1>, VectorLength<2> ]>;

				def int_evl_gather: Intrinsic<[ llvm_anyvector_ty],
				[ LLVMVectorOfAnyPointersToElt<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrReadMem, IntrReadMem, Mask<1>, VectorLength<2> ]>;

				def int_evl_scatter: Intrinsic<[],
				[ llvm_anyvector_ty,
				LLVMVectorOfAnyPointersToElt<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrArgMemOnly, Mask<2>, VectorLength<3> ]>;

				// Reductions
				let IntrProperties = [IntrNoMem, Mask<2>, VectorLength<3>] in {
				def int_evl_reduce_add : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_mul : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_and : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_or : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_xor : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_smax : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_smin : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_umax : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_umin : Intrinsic<[llvm_anyint_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;

				def int_evl_reduce_fadd : Intrinsic<[llvm_anyfloat_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_fmul : Intrinsic<[llvm_anyfloat_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_fmax : Intrinsic<[llvm_anyfloat_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_reduce_fmin : Intrinsic<[llvm_anyfloat_ty],
				[LLVMMatchType<0>,
				llvm_anyvector_ty,
				LLVMVectorSameWidth<1, llvm_i1_ty>,
				llvm_i32_ty]>;
				}

				// Binary operators
				let IntrProperties = [IntrNoMem, Mask<2>, VectorLength<3>] in {
				def int_evl_add : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_sub : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_mul : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_sdiv : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_udiv : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_srem : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_urem : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;

				def int_evl_fadd : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_fsub : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_fmul : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_fdiv : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_frem : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;

				// Logical operators
				def int_evl_ashr : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_lshr : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_shl : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_or : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_and : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_evl_xor : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;

				// Comparison
				// The last argument is the comparison predicate
				def int_evl_cmp : Intrinsic<[ llvm_anyvector_ty ],
				[ llvm_anyvector_ty,
				LLVMMatchType<1>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty,
				llvm_i8_ty]>;
				}



				def int_evl_fneg : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Mask<1>, VectorLength<2> ]>;

				def int_evl_fma : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Mask<3>, VectorLength<4> ]>;

				// Shuffle
				def int_evl_vshift: Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				llvm_i32_ty,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

				def int_evl_expand: Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

				def int_evl_compress: Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Mask<2>, VectorLength<3> ]>;

				// Select
				def int_evl_select : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMVectorSameWidth<0, llvm_i1_ty>,
				llvm_i32_ty],
				[ IntrNoMem, Passthru<1>, Mask<2>, VectorLength<3> ]>;

				// Compose
				def int_evl_compose : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				llvm_i32_ty,
				llvm_i32_ty],
				[ IntrNoMem, VectorLength<3> ]>;




	//===-------------------------- Masked Intrinsics -------------------------===//			//===-------------------------- Masked Intrinsics -------------------------===//
	//			//
	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,			def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,
	LLVMAnyPointerType<LLVMMatchType<0>>,			LLVMAnyPointerType<LLVMMatchType<0>>,
	llvm_i32_ty,			llvm_i32_ty,
	LLVMVectorSameWidth<0, llvm_i1_ty>],			LLVMVectorSameWidth<0, llvm_i1_ty>],
	[IntrArgMemOnly]>;			[IntrArgMemOnly]>;
	▲ Show 20 Lines • Show All 155 Lines • Show Last 20 Lines

include/llvm/Target/TargetSelectionDAG.td

Show First 20 Lines • Show All 122 Lines • ▼ Show 20 Lines
]>;		]>;
def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem		def SDTIntBinHiLoOp : SDTypeProfile<2, 2, [ // mulhi, mullo, sdivrem, udivrem
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>,SDTCisInt<0>
]>;		]>;
def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix		def SDTIntScaledBinOp : SDTypeProfile<1, 3, [ // smulfix
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<3>
]>;		]>;

		def SDTIntBinOpEVL : SDTypeProfile<1, 4, [ // evl_add, evl_and, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisInt<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTIntShiftOpEVL : SDTypeProfile<1, 4, [ // shl, sra, srl
		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisInt<2>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;

def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.		def SDTFPBinOp : SDTypeProfile<1, 2, [ // fadd, fmul, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>
]>;		]>;
def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.		def SDTFPSignOp : SDTypeProfile<1, 2, [ // fcopysign.
SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisFP<2>
]>;		]>;
def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.		def SDTFPTernaryOp : SDTypeProfile<1, 3, [ // fmadd, fnmsub, etc.
SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>
Show All 26 Lines	def SDTExtInreg : SDTypeProfile<1, 2, [ // sext_inreg
SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,		SDTCisSameAs<0, 1>, SDTCisInt<0>, SDTCisVT<2, OtherVT>,
SDTCisVTSmallerThanOp<2, 1>		SDTCisVTSmallerThanOp<2, 1>
]>;		]>;
def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec		def SDTExtInvec : SDTypeProfile<1, 1, [ // sext_invec
SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,		SDTCisInt<0>, SDTCisVec<0>, SDTCisInt<1>, SDTCisVec<1>,
SDTCisOpSmallerThanOp<1, 0>		SDTCisOpSmallerThanOp<1, 0>
]>;		]>;

		def SDTFPUnOpEVL : SDTypeProfile<1, 3, [ // evl_fneg, etc.
		SDTCisSameAs<0, 1>, SDTCisFP<0>, SDTCisInt<3>, SDTCisSameNumEltsAs<0, 2>
		]>;
		def SDTFPBinOpEVL : SDTypeProfile<1, 4, [ // evl_fadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisFP<0>, SDTCisInt<4>, SDTCisSameNumEltsAs<0, 3>
		]>;
		def SDTFPTernaryOpEVL : SDTypeProfile<1, 5, [ // evl_fmadd, etc.
		SDTCisSameAs<0, 1>, SDTCisSameAs<0, 2>, SDTCisSameAs<0, 3>, SDTCisFP<0>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSetCC : SDTypeProfile<1, 3, [ // setcc		def SDTSetCC : SDTypeProfile<1, 3, [ // setcc
SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>		SDTCisInt<0>, SDTCisSameAs<1, 2>, SDTCisVT<3, OtherVT>
]>;		]>;

def SDTSelect : SDTypeProfile<1, 3, [ // select		def SDTSelect : SDTypeProfile<1, 3, [ // select
SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>		SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>
]>;		]>;

def SDTVSelect : SDTypeProfile<1, 3, [ // vselect		def SDTVSelect : SDTypeProfile<1, 3, [ // vselect
SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>
]>;		]>;

		def SDTVSelectEVL : SDTypeProfile<1, 5, [ // evl_vselect
		SDTCisVec<0>, SDTCisInt<1>, SDTCisSameAs<0, 2>, SDTCisSameAs<2, 3>, SDTCisSameNumEltsAs<0, 1>, SDTCisInt<5>, SDTCisSameNumEltsAs<0, 4>
		]>;

def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc		def SDTSelectCC : SDTypeProfile<1, 5, [ // select_cc
SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,		SDTCisSameAs<1, 2>, SDTCisSameAs<3, 4>, SDTCisSameAs<0, 3>,
SDTCisVT<5, OtherVT>		SDTCisVT<5, OtherVT>
]>;		]>;

def SDTBr : SDTypeProfile<0, 1, [ // br		def SDTBr : SDTypeProfile<0, 1, [ // br
SDTCisVT<0, OtherVT>		SDTCisVT<0, OtherVT>
]>;		]>;
Show All 27 Lines
def SDTIStore : SDTypeProfile<1, 3, [ // indexed store		def SDTIStore : SDTypeProfile<1, 3, [ // indexed store
SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>		SDTCisSameAs<0, 2>, SDTCisPtrTy<0>, SDTCisPtrTy<3>
]>;		]>;

def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store		def SDTMaskedStore: SDTypeProfile<0, 3, [ // masked store
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTStoreEVL: SDTypeProfile<0, 4, [ // evl store
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>
		]>;

def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load		def SDTMaskedLoad: SDTypeProfile<1, 3, [ // masked load
SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisVec<2>, SDTCisSameAs<0, 3>,
SDTCisSameNumEltsAs<0, 2>		SDTCisSameNumEltsAs<0, 2>
]>;		]>;

		def SDTLoadEVL : SDTypeProfile<1, 3, [ // evl load
		SDTCisVec<0>, SDTCisPtrTy<1>, SDTCisSameNumEltsAs<0, 2>, SDTCisInt<3>,
		SDTCisSameNumEltsAs<0, 2>
		]>;

def SDTVecShuffle : SDTypeProfile<1, 2, [		def SDTVecShuffle : SDTypeProfile<1, 2, [
SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>		SDTCisSameAs<0, 1>, SDTCisSameAs<1, 2>
]>;		]>;
def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract		def SDTVecExtract : SDTypeProfile<1, 2, [ // vector extract
SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>		SDTCisEltOfVec<0, 1>, SDTCisPtrTy<2>
]>;		]>;
def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert		def SDTVecInsert : SDTypeProfile<1, 3, [ // vector insert
SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>		SDTCisEltOfVec<2, 1>, SDTCisSameAs<0, 1>, SDTCisPtrTy<3>
▲ Show 20 Lines • Show All 139 Lines • ▼ Show 20 Lines	def smin : SDNode<"ISD::SMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,		def smax : SDNode<"ISD::SMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,		def umin : SDNode<"ISD::UMIN" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;
def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,		def umax : SDNode<"ISD::UMAX" , SDTIntBinOp,
[SDNPCommutative, SDNPAssociative]>;		[SDNPCommutative, SDNPAssociative]>;

		def evl_and : SDNode<"ISD::EVL_AND" , SDTIntBinOpEVL ,
		[SDNPCommutative, SDNPAssociative]>;
		def evl_or : SDNode<"ISD::EVL_OR" , SDTIntBinOpEVL ,
		[SDNPCommutative, SDNPAssociative]>;
		def evl_xor : SDNode<"ISD::EVL_XOR" , SDTIntBinOpEVL ,
		[SDNPCommutative, SDNPAssociative]>;
		def evl_srl : SDNode<"ISD::EVL_SRL" , SDTIntShiftOpEVL>;
		def evl_sra : SDNode<"ISD::EVL_SRA" , SDTIntShiftOpEVL>;
		def evl_shl : SDNode<"ISD::EVL_SHL" , SDTIntShiftOpEVL>;

		def evl_add : SDNode<"ISD::EVL_ADD" , SDTIntBinOpEVL ,
		[SDNPCommutative, SDNPAssociative]>;
		def evl_sub : SDNode<"ISD::EVL_SUB" , SDTIntBinOpEVL>;
		def evl_mul : SDNode<"ISD::EVL_MUL" , SDTIntBinOpEVL,
		[SDNPCommutative, SDNPAssociative]>;
		def evl_sdiv : SDNode<"ISD::EVL_SDIV" , SDTIntBinOpEVL>;
		def evl_udiv : SDNode<"ISD::EVL_UDIV" , SDTIntBinOpEVL>;
		def evl_srem : SDNode<"ISD::EVL_SREM" , SDTIntBinOpEVL>;
		def evl_urem : SDNode<"ISD::EVL_UREM" , SDTIntBinOpEVL>;

def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def saddsat : SDNode<"ISD::SADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;		def uaddsat : SDNode<"ISD::UADDSAT" , SDTIntBinOp, [SDNPCommutative]>;
def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;		def ssubsat : SDNode<"ISD::SSUBSAT" , SDTIntBinOp>;
def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;		def usubsat : SDNode<"ISD::USUBSAT" , SDTIntBinOp>;
def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;		def smulfix : SDNode<"ISD::SMULFIX" , SDTIntScaledBinOp, [SDNPCommutative]>;

def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;		def sext_inreg : SDNode<"ISD::SIGN_EXTEND_INREG", SDTExtInreg>;
def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;		def sext_invec : SDNode<"ISD::SIGN_EXTEND_VECTOR_INREG", SDTExtInvec>;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
def ffloor : SDNode<"ISD::FFLOOR" , SDTFPUnaryOp>;		def ffloor : SDNode<"ISD::FFLOOR" , SDTFPUnaryOp>;
def fnearbyint : SDNode<"ISD::FNEARBYINT" , SDTFPUnaryOp>;		def fnearbyint : SDNode<"ISD::FNEARBYINT" , SDTFPUnaryOp>;
def fround : SDNode<"ISD::FROUND" , SDTFPUnaryOp>;		def fround : SDNode<"ISD::FROUND" , SDTFPUnaryOp>;

def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;		def fpround : SDNode<"ISD::FP_ROUND" , SDTFPRoundOp>;
def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;		def fpextend : SDNode<"ISD::FP_EXTEND" , SDTFPExtendOp>;
def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;		def fcopysign : SDNode<"ISD::FCOPYSIGN" , SDTFPSignOp>;

		def evl_fneg : SDNode<"ISD::EVL_FNEG" , SDTFPUnOpEVL>;
		def evl_fadd : SDNode<"ISD::EVL_FADD" , SDTFPBinOpEVL, [SDNPCommutative]>;
		def evl_fsub : SDNode<"ISD::EVL_FSUB" , SDTFPBinOpEVL>;
		def evl_fmul : SDNode<"ISD::EVL_FMUL" , SDTFPBinOpEVL, [SDNPCommutative]>;
		def evl_fdiv : SDNode<"ISD::EVL_FDIV" , SDTFPBinOpEVL>;
		def evl_frem : SDNode<"ISD::EVL_FREM" , SDTFPBinOpEVL>;
		def evl_fma : SDNode<"ISD::EVL_FMA" , SDTFPTernaryOpEVL>;

def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;		def sint_to_fp : SDNode<"ISD::SINT_TO_FP" , SDTIntToFPOp>;
def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;		def uint_to_fp : SDNode<"ISD::UINT_TO_FP" , SDTIntToFPOp>;
def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;		def fp_to_sint : SDNode<"ISD::FP_TO_SINT" , SDTFPToIntOp>;
def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;		def fp_to_uint : SDNode<"ISD::FP_TO_UINT" , SDTFPToIntOp>;
def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;		def f16_to_fp : SDNode<"ISD::FP16_TO_FP" , SDTIntToFPOp>;
def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;		def fp_to_f16 : SDNode<"ISD::FP_TO_FP16" , SDTFPToIntOp>;

def setcc : SDNode<"ISD::SETCC" , SDTSetCC>;		def setcc : SDNode<"ISD::SETCC" , SDTSetCC>;
def select : SDNode<"ISD::SELECT" , SDTSelect>;		def select : SDNode<"ISD::SELECT" , SDTSelect>;
def vselect : SDNode<"ISD::VSELECT" , SDTVSelect>;		def vselect : SDNode<"ISD::VSELECT" , SDTVSelect>;
def selectcc : SDNode<"ISD::SELECT_CC" , SDTSelectCC>;		def selectcc : SDNode<"ISD::SELECT_CC" , SDTSelectCC>;

def brcc : SDNode<"ISD::BR_CC" , SDTBrCC, [SDNPHasChain]>;		def brcc : SDNode<"ISD::BR_CC" , SDTBrCC, [SDNPHasChain]>;
def brcond : SDNode<"ISD::BRCOND" , SDTBrcond, [SDNPHasChain]>;		def brcond : SDNode<"ISD::BRCOND" , SDTBrcond, [SDNPHasChain]>;
def brind : SDNode<"ISD::BRIND" , SDTBrind, [SDNPHasChain]>;		def brind : SDNode<"ISD::BRIND" , SDTBrind, [SDNPHasChain]>;
def br : SDNode<"ISD::BR" , SDTBr, [SDNPHasChain]>;		def br : SDNode<"ISD::BR" , SDTBr, [SDNPHasChain]>;
def catchret : SDNode<"ISD::CATCHRET" , SDTCatchret,		def catchret : SDNode<"ISD::CATCHRET" , SDTCatchret,
[SDNPHasChain, SDNPSideEffect]>;		[SDNPHasChain, SDNPSideEffect]>;
def cleanupret : SDNode<"ISD::CLEANUPRET" , SDTNone, [SDNPHasChain]>;		def cleanupret : SDNode<"ISD::CLEANUPRET" , SDTNone, [SDNPHasChain]>;
▲ Show 20 Lines • Show All 51 Lines • ▼ Show 20 Lines
def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,		def atomic_store : SDNode<"ISD::ATOMIC_STORE", SDTAtomicStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;

def masked_store : SDNode<"ISD::MSTORE", SDTMaskedStore,		def masked_store : SDNode<"ISD::MSTORE", SDTMaskedStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def masked_load : SDNode<"ISD::MLOAD", SDTMaskedLoad,		def masked_load : SDNode<"ISD::MLOAD", SDTMaskedLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

		def evl_store : SDNode<"ISD::EVL_STORE", SDTMaskedStore,
		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
		def evl_load : SDNode<"ISD::EVL_LOAD", SDTMaskedLoad,
		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;

// Do not use ld, st directly. Use load, extload, sextload, zextload, store,		// Do not use ld, st directly. Use load, extload, sextload, zextload, store,
// and truncst (see below).		// and truncst (see below).
def ld : SDNode<"ISD::LOAD" , SDTLoad,		def ld : SDNode<"ISD::LOAD" , SDTLoad,
[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayLoad, SDNPMemOperand]>;
def st : SDNode<"ISD::STORE" , SDTStore,		def st : SDNode<"ISD::STORE" , SDTStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
def ist : SDNode<"ISD::STORE" , SDTIStore,		def ist : SDNode<"ISD::STORE" , SDTIStore,
[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;		[SDNPHasChain, SDNPMayStore, SDNPMemOperand]>;
▲ Show 20 Lines • Show All 828 Lines • Show Last 20 Lines

lib/AsmParser/LLLexer.cpp

Show First 20 Lines • Show All 636 Lines • ▼ Show 20 Lines	#define KEYWORD(STR) \
KEYWORD(convergent);		KEYWORD(convergent);
KEYWORD(dereferenceable);		KEYWORD(dereferenceable);
KEYWORD(dereferenceable_or_null);		KEYWORD(dereferenceable_or_null);
KEYWORD(inaccessiblememonly);		KEYWORD(inaccessiblememonly);
KEYWORD(inaccessiblemem_or_argmemonly);		KEYWORD(inaccessiblemem_or_argmemonly);
KEYWORD(inlinehint);		KEYWORD(inlinehint);
KEYWORD(inreg);		KEYWORD(inreg);
KEYWORD(jumptable);		KEYWORD(jumptable);
		KEYWORD(mask);
KEYWORD(minsize);		KEYWORD(minsize);
KEYWORD(naked);		KEYWORD(naked);
KEYWORD(nest);		KEYWORD(nest);
KEYWORD(noalias);		KEYWORD(noalias);
KEYWORD(nobuiltin);		KEYWORD(nobuiltin);
KEYWORD(nocapture);		KEYWORD(nocapture);
KEYWORD(noduplicate);		KEYWORD(noduplicate);
KEYWORD(noimplicitfloat);		KEYWORD(noimplicitfloat);
KEYWORD(noinline);		KEYWORD(noinline);
KEYWORD(norecurse);		KEYWORD(norecurse);
KEYWORD(nonlazybind);		KEYWORD(nonlazybind);
KEYWORD(nonnull);		KEYWORD(nonnull);
KEYWORD(noredzone);		KEYWORD(noredzone);
KEYWORD(noreturn);		KEYWORD(noreturn);
KEYWORD(nocf_check);		KEYWORD(nocf_check);
KEYWORD(nounwind);		KEYWORD(nounwind);
KEYWORD(optforfuzzing);		KEYWORD(optforfuzzing);
KEYWORD(optnone);		KEYWORD(optnone);
KEYWORD(optsize);		KEYWORD(optsize);
		KEYWORD(passthru);
KEYWORD(readnone);		KEYWORD(readnone);
KEYWORD(readonly);		KEYWORD(readonly);
KEYWORD(returned);		KEYWORD(returned);
KEYWORD(returns_twice);		KEYWORD(returns_twice);
KEYWORD(signext);		KEYWORD(signext);
KEYWORD(speculatable);		KEYWORD(speculatable);
KEYWORD(sret);		KEYWORD(sret);
KEYWORD(ssp);		KEYWORD(ssp);
KEYWORD(sspreq);		KEYWORD(sspreq);
KEYWORD(sspstrong);		KEYWORD(sspstrong);
KEYWORD(strictfp);		KEYWORD(strictfp);
KEYWORD(safestack);		KEYWORD(safestack);
KEYWORD(shadowcallstack);		KEYWORD(shadowcallstack);
KEYWORD(sanitize_address);		KEYWORD(sanitize_address);
KEYWORD(sanitize_hwaddress);		KEYWORD(sanitize_hwaddress);
KEYWORD(sanitize_thread);		KEYWORD(sanitize_thread);
KEYWORD(sanitize_memory);		KEYWORD(sanitize_memory);
KEYWORD(speculative_load_hardening);		KEYWORD(speculative_load_hardening);
KEYWORD(swifterror);		KEYWORD(swifterror);
KEYWORD(swiftself);		KEYWORD(swiftself);
KEYWORD(uwtable);		KEYWORD(uwtable);
		KEYWORD(vlen);
KEYWORD(writeonly);		KEYWORD(writeonly);
KEYWORD(zeroext);		KEYWORD(zeroext);

KEYWORD(type);		KEYWORD(type);
KEYWORD(opaque);		KEYWORD(opaque);

KEYWORD(comdat);		KEYWORD(comdat);

▲ Show 20 Lines • Show All 429 Lines • Show Last 20 Lines

lib/AsmParser/LLParser.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,288 Lines • ▼ Show 20 Lines	case lltok::kw_zeroext:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of attribute on a function");		"invalid use of attribute on a function");
break;		break;
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_dereferenceable:		case lltok::kw_dereferenceable:
case lltok::kw_dereferenceable_or_null:		case lltok::kw_dereferenceable_or_null:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_noalias:		case lltok::kw_noalias:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
case lltok::kw_nonnull:		case lltok::kw_nonnull:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
		case lltok::kw_vlen:
HaveError \|=		HaveError \|=
Error(Lex.getLoc(),		Error(Lex.getLoc(),
"invalid use of parameter-only attribute on a function");		"invalid use of parameter-only attribute on a function");
break;		break;
}		}

Lex.Lex();		Lex.Lex();
}		}
▲ Show 20 Lines • Show All 264 Lines • ▼ Show 20 Lines	case lltok::kw_dereferenceable_or_null: {
uint64_t Bytes;		uint64_t Bytes;
if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))		if (ParseOptionalDerefAttrBytes(lltok::kw_dereferenceable_or_null, Bytes))
return true;		return true;
B.addDereferenceableOrNullAttr(Bytes);		B.addDereferenceableOrNullAttr(Bytes);
continue;		continue;
}		}
case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;		case lltok::kw_inalloca: B.addAttribute(Attribute::InAlloca); break;
case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;		case lltok::kw_inreg: B.addAttribute(Attribute::InReg); break;
		case lltok::kw_mask: B.addAttribute(Attribute::Mask); break;
case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;		case lltok::kw_nest: B.addAttribute(Attribute::Nest); break;
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;		case lltok::kw_nocapture: B.addAttribute(Attribute::NoCapture); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
		case lltok::kw_passthru: B.addAttribute(Attribute::Passthru); break;
case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;		case lltok::kw_readnone: B.addAttribute(Attribute::ReadNone); break;
case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;		case lltok::kw_readonly: B.addAttribute(Attribute::ReadOnly); break;
case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;		case lltok::kw_returned: B.addAttribute(Attribute::Returned); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;		case lltok::kw_sret: B.addAttribute(Attribute::StructRet); break;
case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;		case lltok::kw_swifterror: B.addAttribute(Attribute::SwiftError); break;
case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;		case lltok::kw_swiftself: B.addAttribute(Attribute::SwiftSelf); break;
		case lltok::kw_vlen: B.addAttribute(Attribute::VectorLength); break;
case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;		case lltok::kw_writeonly: B.addAttribute(Attribute::WriteOnly); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
case lltok::kw_inlinehint:		case lltok::kw_inlinehint:
▲ Show 20 Lines • Show All 74 Lines • ▼ Show 20 Lines	while (true) {
case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;		case lltok::kw_noalias: B.addAttribute(Attribute::NoAlias); break;
case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;		case lltok::kw_nonnull: B.addAttribute(Attribute::NonNull); break;
case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;		case lltok::kw_signext: B.addAttribute(Attribute::SExt); break;
case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;		case lltok::kw_zeroext: B.addAttribute(Attribute::ZExt); break;

// Error handling.		// Error handling.
case lltok::kw_byval:		case lltok::kw_byval:
case lltok::kw_inalloca:		case lltok::kw_inalloca:
		case lltok::kw_mask:
case lltok::kw_nest:		case lltok::kw_nest:
case lltok::kw_nocapture:		case lltok::kw_nocapture:
		case lltok::kw_passthru:
case lltok::kw_returned:		case lltok::kw_returned:
case lltok::kw_sret:		case lltok::kw_sret:
case lltok::kw_swifterror:		case lltok::kw_swifterror:
case lltok::kw_swiftself:		case lltok::kw_swiftself:
		case lltok::kw_vlen:
HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");		HaveError \|= Error(Lex.getLoc(), "invalid use of parameter-only attribute");
break;		break;

case lltok::kw_alignstack:		case lltok::kw_alignstack:
case lltok::kw_alwaysinline:		case lltok::kw_alwaysinline:
case lltok::kw_argmemonly:		case lltok::kw_argmemonly:
case lltok::kw_builtin:		case lltok::kw_builtin:
case lltok::kw_cold:		case lltok::kw_cold:
▲ Show 20 Lines • Show All 1,589 Lines • ▼ Show 20 Lines	if (Opc == Instruction::FCmp) {
if (!Val0->getType()->isIntOrIntVectorTy() &&		if (!Val0->getType()->isIntOrIntVectorTy() &&
!Val0->getType()->isPtrOrPtrVectorTy())		!Val0->getType()->isPtrOrPtrVectorTy())
return Error(ID.Loc, "icmp requires pointer or integer operands");		return Error(ID.Loc, "icmp requires pointer or integer operands");
ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);		ID.ConstantVal = ConstantExpr::getICmp(Pred, Val0, Val1);
}		}
ID.Kind = ValID::t_Constant;		ID.Kind = ValID::t_Constant;
return false;		return false;
}		}

// Unary Operators.		// Unary Operators.
case lltok::kw_fneg: {		case lltok::kw_fneg: {
unsigned Opc = Lex.getUIntVal();		unsigned Opc = Lex.getUIntVal();
Constant *Val;		Constant *Val;
Lex.Lex();		Lex.Lex();
if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|		if (ParseToken(lltok::lparen, "expected '(' in unary constantexpr") \|\|
ParseGlobalTypeAndValue(Val) \|\|		ParseGlobalTypeAndValue(Val) \|\|
ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))		ParseToken(lltok::rparen, "expected ')' in unary constantexpr"))
return true;		return true;

// Check that the type is valid for the operator.		// Check that the type is valid for the operator.
switch (Opc) {		switch (Opc) {
case Instruction::FNeg:		case Instruction::FNeg:
if (!Val->getType()->isFPOrFPVectorTy())		if (!Val->getType()->isFPOrFPVectorTy())
return Error(ID.Loc, "constexpr requires fp operands");		return Error(ID.Loc, "constexpr requires fp operands");
break;		break;
default: llvm_unreachable("Unknown unary operator!");		default: llvm_unreachable("Unknown unary operator!");
}		}
▲ Show 20 Lines • Show All 2,848 Lines • ▼ Show 20 Lines	bool LLParser::ParseUnaryOp(Instruction *&Inst, PerFunctionState &PFS,

bool Valid;		bool Valid;
switch (OperandType) {		switch (OperandType) {
default: llvm_unreachable("Unknown operand type!");		default: llvm_unreachable("Unknown operand type!");
case 0: // int or FP.		case 0: // int or FP.
Valid = LHS->getType()->isIntOrIntVectorTy() \|\|		Valid = LHS->getType()->isIntOrIntVectorTy() \|\|
LHS->getType()->isFPOrFPVectorTy();		LHS->getType()->isFPOrFPVectorTy();
break;		break;
case 1:		case 1:
Valid = LHS->getType()->isIntOrIntVectorTy();		Valid = LHS->getType()->isIntOrIntVectorTy();
break;		break;
case 2:		case 2:
Valid = LHS->getType()->isFPOrFPVectorTy();		Valid = LHS->getType()->isFPOrFPVectorTy();
break;		break;
}		}

if (!Valid)		if (!Valid)
return Error(Loc, "invalid operand type for instruction");		return Error(Loc, "invalid operand type for instruction");

Inst = UnaryOperator::Create((Instruction::UnaryOps)Opc, LHS);		Inst = UnaryOperator::Create((Instruction::UnaryOps)Opc, LHS);
return false;		return false;
▲ Show 20 Lines • Show All 2,224 Lines • Show Last 20 Lines

lib/AsmParser/LLToken.h

Show First 20 Lines • Show All 180 Lines • ▼ Show 20 Lines	enum Kind {
kw_convergent,		kw_convergent,
kw_dereferenceable,		kw_dereferenceable,
kw_dereferenceable_or_null,		kw_dereferenceable_or_null,
kw_inaccessiblememonly,		kw_inaccessiblememonly,
kw_inaccessiblemem_or_argmemonly,		kw_inaccessiblemem_or_argmemonly,
kw_inlinehint,		kw_inlinehint,
kw_inreg,		kw_inreg,
kw_jumptable,		kw_jumptable,
		kw_mask,
kw_minsize,		kw_minsize,
kw_naked,		kw_naked,
kw_nest,		kw_nest,
kw_noalias,		kw_noalias,
kw_nobuiltin,		kw_nobuiltin,
kw_nocapture,		kw_nocapture,
kw_noduplicate,		kw_noduplicate,
kw_noimplicitfloat,		kw_noimplicitfloat,
kw_noinline,		kw_noinline,
kw_norecurse,		kw_norecurse,
kw_nonlazybind,		kw_nonlazybind,
kw_nonnull,		kw_nonnull,
kw_noredzone,		kw_noredzone,
kw_noreturn,		kw_noreturn,
kw_nocf_check,		kw_nocf_check,
kw_nounwind,		kw_nounwind,
kw_optforfuzzing,		kw_optforfuzzing,
kw_optnone,		kw_optnone,
kw_optsize,		kw_optsize,
		kw_passthru,
kw_readnone,		kw_readnone,
kw_readonly,		kw_readonly,
kw_returned,		kw_returned,
kw_returns_twice,		kw_returns_twice,
kw_signext,		kw_signext,
kw_speculatable,		kw_speculatable,
kw_ssp,		kw_ssp,
kw_sspreq,		kw_sspreq,
kw_sspstrong,		kw_sspstrong,
kw_safestack,		kw_safestack,
kw_shadowcallstack,		kw_shadowcallstack,
kw_sret,		kw_sret,
kw_sanitize_thread,		kw_sanitize_thread,
kw_sanitize_memory,		kw_sanitize_memory,
kw_speculative_load_hardening,		kw_speculative_load_hardening,
kw_strictfp,		kw_strictfp,
kw_swifterror,		kw_swifterror,
kw_swiftself,		kw_swiftself,
kw_uwtable,		kw_uwtable,
		kw_vlen,
kw_writeonly,		kw_writeonly,
kw_zeroext,		kw_zeroext,

kw_type,		kw_type,
kw_opaque,		kw_opaque,

kw_comdat,		kw_comdat,

▲ Show 20 Lines • Show All 223 Lines • Show Last 20 Lines

lib/Bitcode/Reader/BitcodeReader.cpp

Show First 20 Lines • Show All 1,326 Lines • ▼ Show 20 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:		case bitc::ATTR_KIND_INACCESSIBLEMEM_OR_ARGMEMONLY:
return Attribute::InaccessibleMemOrArgMemOnly;		return Attribute::InaccessibleMemOrArgMemOnly;
case bitc::ATTR_KIND_INLINE_HINT:		case bitc::ATTR_KIND_INLINE_HINT:
return Attribute::InlineHint;		return Attribute::InlineHint;
case bitc::ATTR_KIND_IN_REG:		case bitc::ATTR_KIND_IN_REG:
return Attribute::InReg;		return Attribute::InReg;
case bitc::ATTR_KIND_JUMP_TABLE:		case bitc::ATTR_KIND_JUMP_TABLE:
return Attribute::JumpTable;		return Attribute::JumpTable;
		case bitc::ATTR_KIND_MASK:
		return Attribute::Mask;
case bitc::ATTR_KIND_MIN_SIZE:		case bitc::ATTR_KIND_MIN_SIZE:
return Attribute::MinSize;		return Attribute::MinSize;
case bitc::ATTR_KIND_NAKED:		case bitc::ATTR_KIND_NAKED:
return Attribute::Naked;		return Attribute::Naked;
case bitc::ATTR_KIND_NEST:		case bitc::ATTR_KIND_NEST:
return Attribute::Nest;		return Attribute::Nest;
case bitc::ATTR_KIND_NO_ALIAS:		case bitc::ATTR_KIND_NO_ALIAS:
return Attribute::NoAlias;		return Attribute::NoAlias;
Show All 28 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_NO_UNWIND:		case bitc::ATTR_KIND_NO_UNWIND:
return Attribute::NoUnwind;		return Attribute::NoUnwind;
case bitc::ATTR_KIND_OPT_FOR_FUZZING:		case bitc::ATTR_KIND_OPT_FOR_FUZZING:
return Attribute::OptForFuzzing;		return Attribute::OptForFuzzing;
case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:		case bitc::ATTR_KIND_OPTIMIZE_FOR_SIZE:
return Attribute::OptimizeForSize;		return Attribute::OptimizeForSize;
case bitc::ATTR_KIND_OPTIMIZE_NONE:		case bitc::ATTR_KIND_OPTIMIZE_NONE:
return Attribute::OptimizeNone;		return Attribute::OptimizeNone;
		case bitc::ATTR_KIND_PASSTHRU:
		return Attribute::Passthru;
case bitc::ATTR_KIND_READ_NONE:		case bitc::ATTR_KIND_READ_NONE:
return Attribute::ReadNone;		return Attribute::ReadNone;
case bitc::ATTR_KIND_READ_ONLY:		case bitc::ATTR_KIND_READ_ONLY:
return Attribute::ReadOnly;		return Attribute::ReadOnly;
case bitc::ATTR_KIND_RETURNED:		case bitc::ATTR_KIND_RETURNED:
return Attribute::Returned;		return Attribute::Returned;
case bitc::ATTR_KIND_RETURNS_TWICE:		case bitc::ATTR_KIND_RETURNS_TWICE:
return Attribute::ReturnsTwice;		return Attribute::ReturnsTwice;
Show All 28 Lines	static Attribute::AttrKind getAttrFromCode(uint64_t Code) {
case bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING:		case bitc::ATTR_KIND_SPECULATIVE_LOAD_HARDENING:
return Attribute::SpeculativeLoadHardening;		return Attribute::SpeculativeLoadHardening;
case bitc::ATTR_KIND_SWIFT_ERROR:		case bitc::ATTR_KIND_SWIFT_ERROR:
return Attribute::SwiftError;		return Attribute::SwiftError;
case bitc::ATTR_KIND_SWIFT_SELF:		case bitc::ATTR_KIND_SWIFT_SELF:
return Attribute::SwiftSelf;		return Attribute::SwiftSelf;
case bitc::ATTR_KIND_UW_TABLE:		case bitc::ATTR_KIND_UW_TABLE:
return Attribute::UWTable;		return Attribute::UWTable;
		case bitc::ATTR_KIND_VECTORLENGTH:
		return Attribute::VectorLength;
case bitc::ATTR_KIND_WRITEONLY:		case bitc::ATTR_KIND_WRITEONLY:
return Attribute::WriteOnly;		return Attribute::WriteOnly;
case bitc::ATTR_KIND_Z_EXT:		case bitc::ATTR_KIND_Z_EXT:
return Attribute::ZExt;		return Attribute::ZExt;
}		}
}		}

Error BitcodeReader::parseAlignmentValue(uint64_t Exponent,		Error BitcodeReader::parseAlignmentValue(uint64_t Exponent,
▲ Show 20 Lines • Show All 4,700 Lines • Show Last 20 Lines

lib/Bitcode/Writer/BitcodeWriter.cpp

Show First 20 Lines • Show All 664 Lines • ▼ Show 20 Lines	static uint64_t getAttrKindEncoding(Attribute::AttrKind Kind) {
case Attribute::OptimizeNone:		case Attribute::OptimizeNone:
return bitc::ATTR_KIND_OPTIMIZE_NONE;		return bitc::ATTR_KIND_OPTIMIZE_NONE;
case Attribute::ReadNone:		case Attribute::ReadNone:
return bitc::ATTR_KIND_READ_NONE;		return bitc::ATTR_KIND_READ_NONE;
case Attribute::ReadOnly:		case Attribute::ReadOnly:
return bitc::ATTR_KIND_READ_ONLY;		return bitc::ATTR_KIND_READ_ONLY;
case Attribute::Returned:		case Attribute::Returned:
return bitc::ATTR_KIND_RETURNED;		return bitc::ATTR_KIND_RETURNED;
		case Attribute::Mask:
		return bitc::ATTR_KIND_MASK;
		case Attribute::VectorLength:
		return bitc::ATTR_KIND_VECTORLENGTH;
		case Attribute::Passthru:
		return bitc::ATTR_KIND_PASSTHRU;
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
return bitc::ATTR_KIND_RETURNS_TWICE;		return bitc::ATTR_KIND_RETURNS_TWICE;
case Attribute::SExt:		case Attribute::SExt:
return bitc::ATTR_KIND_S_EXT;		return bitc::ATTR_KIND_S_EXT;
case Attribute::Speculatable:		case Attribute::Speculatable:
return bitc::ATTR_KIND_SPECULATABLE;		return bitc::ATTR_KIND_SPECULATABLE;
case Attribute::StackAlignment:		case Attribute::StackAlignment:
return bitc::ATTR_KIND_STACK_ALIGNMENT;		return bitc::ATTR_KIND_STACK_ALIGNMENT;
▲ Show 20 Lines • Show All 3,819 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 347 Lines • ▼ Show 20 Lines	private:
SDValue visitAssertExt(SDNode *N);		SDValue visitAssertExt(SDNode *N);
SDValue visitSIGN_EXTEND_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_INREG(SDNode *N);
SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitSIGN_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);		SDValue visitZERO_EXTEND_VECTOR_INREG(SDNode *N);
SDValue visitTRUNCATE(SDNode *N);		SDValue visitTRUNCATE(SDNode *N);
SDValue visitBITCAST(SDNode *N);		SDValue visitBITCAST(SDNode *N);
SDValue visitBUILD_PAIR(SDNode *N);		SDValue visitBUILD_PAIR(SDNode *N);
SDValue visitFADD(SDNode *N);		SDValue visitFADD(SDNode *N);
		SDValue visitFADD_EVL(SDNode *N);
SDValue visitFSUB(SDNode *N);		SDValue visitFSUB(SDNode *N);
SDValue visitFMUL(SDNode *N);		SDValue visitFMUL(SDNode *N);
SDValue visitFMA(SDNode *N);		SDValue visitFMA(SDNode *N);
SDValue visitFDIV(SDNode *N);		SDValue visitFDIV(SDNode *N);
SDValue visitFREM(SDNode *N);		SDValue visitFREM(SDNode *N);
SDValue visitFSQRT(SDNode *N);		SDValue visitFSQRT(SDNode *N);
SDValue visitFCOPYSIGN(SDNode *N);		SDValue visitFCOPYSIGN(SDNode *N);
SDValue visitFPOW(SDNode *N);		SDValue visitFPOW(SDNode *N);
Show All 31 Lines	private:
SDValue visitINSERT_SUBVECTOR(SDNode *N);		SDValue visitINSERT_SUBVECTOR(SDNode *N);
SDValue visitMLOAD(SDNode *N);		SDValue visitMLOAD(SDNode *N);
SDValue visitMSTORE(SDNode *N);		SDValue visitMSTORE(SDNode *N);
SDValue visitMGATHER(SDNode *N);		SDValue visitMGATHER(SDNode *N);
SDValue visitMSCATTER(SDNode *N);		SDValue visitMSCATTER(SDNode *N);
SDValue visitFP_TO_FP16(SDNode *N);		SDValue visitFP_TO_FP16(SDNode *N);
SDValue visitFP16_TO_FP(SDNode *N);		SDValue visitFP16_TO_FP(SDNode *N);

		template<class MatchContextClass>
SDValue visitFADDForFMACombine(SDNode *N);		SDValue visitFADDForFMACombine(SDNode *N);
SDValue visitFSUBForFMACombine(SDNode *N);		SDValue visitFSUBForFMACombine(SDNode *N);
SDValue visitFMULForFMADistributiveCombine(SDNode *N);		SDValue visitFMULForFMADistributiveCombine(SDNode *N);

SDValue XformToShuffleWithZero(SDNode *N);		SDValue XformToShuffleWithZero(SDNode *N);
SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,		SDValue ReassociateOps(unsigned Opc, const SDLoc &DL, SDValue N0,
SDValue N1, SDNodeFlags Flags);		SDValue N1, SDNodeFlags Flags);

▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	public:
explicit WorklistRemover(DAGCombiner &dc)		explicit WorklistRemover(DAGCombiner &dc)
: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}		: SelectionDAG::DAGUpdateListener(dc.getDAG()), DC(dc) {}

void NodeDeleted(SDNode N, SDNode E) override {		void NodeDeleted(SDNode N, SDNode E) override {
DC.removeFromWorklist(N);		DC.removeFromWorklist(N);
}		}
};		};

		// TODO port this to EVL nodes
		struct EmptyMatchContext {
		SelectionDAG & DAG;

		EmptyMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		{}

		bool match(SDValue OpN, unsigned OpCode) const { return OpCode == OpN->getOpcode(); }

		unsigned getFunctionOpCode(SDValue N) const {
		return N->getOpcode();
		}

		bool isCompatible(SDValue OpVal) const { return true; }

		// Specialize based on number of operands.
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, Operand, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4);
		}

		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3, SDValue N4, SDValue N5) {
		return DAG.getNode(Opcode, DL, VT, N1, N2, N3, N4, N5);
		}
		};

		struct
		EVLMatchContext {
		SelectionDAG & DAG;
		SDNode * Root;
		SDValue RootMaskOp;
		SDValue RootVectorLenOp;

		EVLMatchContext(SelectionDAG & DAG, SDNode * Root)
		: DAG(DAG)
		, Root(Root)
		, RootMaskOp()
		, RootVectorLenOp()
		{
		if (Root->isEVL()) {
		int RootMaskPos = ISD::GetMaskPosEVL(Root->getOpcode());
		if (RootMaskPos != -1) {
		RootMaskOp = Root->getOperand(RootMaskPos);
		}

		int RootVLenPos = ISD::GetVectorLengthPosEVL(Root->getOpcode());
		if (RootVLenPos != -1) {
		RootVectorLenOp = Root->getOperand(RootVLenPos);
		}
		}
		}

		unsigned getFunctionOpCode(SDValue N) const {
		unsigned EVLOpCode = N->getOpcode();
		return ISD::GetFunctionOpCodeForEVL(EVLOpCode);
		}

		bool isCompatible(SDValue OpVal) const {
		if (!OpVal->isEVL()) {
		return !Root->isEVL();

		} else {
		unsigned EVLOpCode = OpVal->getOpcode();
		int MaskPos = ISD::GetMaskPosEVL(EVLOpCode);
		if (MaskPos != -1 && RootMaskOp != OpVal.getOperand(MaskPos)) {
		return false;
		}

		int VLenPos = ISD::GetVectorLengthPosEVL(EVLOpCode);
		if (VLenPos != -1 && RootVectorLenOp != OpVal.getOperand(VLenPos)) {
		return false;
		}

		return true;
		}
		}

		/// whether \p OpN is a node that is functionally compatible with the NodeType \p OpNodeTy
		bool match(SDValue OpVal, unsigned OpNT) const {
		return isCompatible(OpVal) && getFunctionOpCode(OpVal) == OpNT;
		}

		// Specialize based on number of operands.
		// TODO emit EVL intrinsics where MaskOp/VectorLenOp != null
		// SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT) { return DAG.getNode(Opcode, DL, VT); }
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue Operand,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned EVLOpcode = ISD::GetEVLForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosEVL(EVLOpcode);
		int VLenPos = ISD::GetVectorLengthPosEVL(EVLOpcode);
		assert(MaskPos == 1 && VLenPos == 2);

		return DAG.getNode(EVLOpcode, DL, VT, {Operand, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned EVLOpcode = ISD::GetEVLForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosEVL(EVLOpcode);
		int VLenPos = ISD::GetVectorLengthPosEVL(EVLOpcode);
		assert(MaskPos == 2 && VLenPos == 3);

		return DAG.getNode(EVLOpcode, DL, VT, {N1, N2, RootMaskOp, RootVectorLenOp}, Flags);
		}
		SDValue getNode(unsigned Opcode, const SDLoc &DL, EVT VT, SDValue N1,
		SDValue N2, SDValue N3,
		const SDNodeFlags Flags = SDNodeFlags()) {
		unsigned EVLOpcode = ISD::GetEVLForFunctionOpCode(Opcode);
		int MaskPos = ISD::GetMaskPosEVL(EVLOpcode);
		int VLenPos = ISD::GetVectorLengthPosEVL(EVLOpcode);
		assert(MaskPos == 3 && VLenPos == 4);

		return DAG.getNode(EVLOpcode, DL, VT, {N1, N2, N3, RootMaskOp, RootVectorLenOp}, Flags);
		}
		};

} // end anonymous namespace		} // end anonymous namespace

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// TargetLowering::DAGCombinerInfo implementation		// TargetLowering::DAGCombinerInfo implementation
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {		void TargetLowering::DAGCombinerInfo::AddToWorklist(SDNode *N) {
((DAGCombiner*)DC)->AddToWorklist(N);		((DAGCombiner*)DC)->AddToWorklist(N);
▲ Show 20 Lines • Show All 892 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visit(SDNode *N) {
case ISD::AssertZext: return visitAssertExt(N);		case ISD::AssertZext: return visitAssertExt(N);
case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);		case ISD::SIGN_EXTEND_INREG: return visitSIGN_EXTEND_INREG(N);
case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);		case ISD::SIGN_EXTEND_VECTOR_INREG: return visitSIGN_EXTEND_VECTOR_INREG(N);
case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);		case ISD::ZERO_EXTEND_VECTOR_INREG: return visitZERO_EXTEND_VECTOR_INREG(N);
case ISD::TRUNCATE: return visitTRUNCATE(N);		case ISD::TRUNCATE: return visitTRUNCATE(N);
case ISD::BITCAST: return visitBITCAST(N);		case ISD::BITCAST: return visitBITCAST(N);
case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);		case ISD::BUILD_PAIR: return visitBUILD_PAIR(N);
case ISD::FADD: return visitFADD(N);		case ISD::FADD: return visitFADD(N);
		case ISD::EVL_FADD: return visitFADD_EVL(N);
case ISD::FSUB: return visitFSUB(N);		case ISD::FSUB: return visitFSUB(N);
case ISD::FMUL: return visitFMUL(N);		case ISD::FMUL: return visitFMUL(N);
case ISD::FMA: return visitFMA(N);		case ISD::FMA: return visitFMA(N);
case ISD::FDIV: return visitFDIV(N);		case ISD::FDIV: return visitFDIV(N);
case ISD::FREM: return visitFREM(N);		case ISD::FREM: return visitFREM(N);
case ISD::FSQRT: return visitFSQRT(N);		case ISD::FSQRT: return visitFSQRT(N);
case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);		case ISD::FCOPYSIGN: return visitFCOPYSIGN(N);
case ISD::FPOW: return visitFPOW(N);		case ISD::FPOW: return visitFPOW(N);
▲ Show 20 Lines • Show All 8,875 Lines • ▼ Show 20 Lines	ConstantFoldBITCASTofBUILD_VECTOR(SDNode *BV, EVT DstEltVT) {
return DAG.getBuildVector(VT, DL, Ops);		return DAG.getBuildVector(VT, DL, Ops);
}		}

static bool isContractable(SDNode *N) {		static bool isContractable(SDNode *N) {
SDNodeFlags F = N->getFlags();		SDNodeFlags F = N->getFlags();
return F.hasAllowContract() \|\| F.hasAllowReassociation();		return F.hasAllowContract() \|\| F.hasAllowReassociation();
}		}


/// Try to perform FMA combining on a given FADD node.		/// Try to perform FMA combining on a given FADD node.
		template<class MatchContextClass>
SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {		SDValue DAGCombiner::visitFADDForFMACombine(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc SL(N);		SDLoc SL(N);

		MatchContextClass matcher(DAG, N);
		if (!matcher.isCompatible(N0) \|\| !matcher.isCompatible(N1)) return SDValue();

const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;

// Floating-point multiply-add with intermediate rounding.		// Floating-point multiply-add with intermediate rounding.
bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));		bool HasFMAD = (LegalOperations && TLI.isOperationLegal(ISD::FMAD, VT));

// Floating-point multiply-add without intermediate rounding.		// Floating-point multiply-add without intermediate rounding.
bool HasFMA =		bool HasFMA =
TLI.isFMAFasterThanFMulAndFAdd(VT) &&		TLI.isFMAFasterThanFMulAndFAdd(VT) &&
Show All 16 Lines	if (STI && STI->generateFMAsInMachineCombiner(OptLevel))
return SDValue();		return SDValue();

// Always prefer FMAD to FMA for precision.		// Always prefer FMAD to FMA for precision.
unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;		unsigned PreferredFusedOpcode = HasFMAD ? ISD::FMAD : ISD::FMA;
bool Aggressive = TLI.enableAggressiveFMAFusion(VT);		bool Aggressive = TLI.enableAggressiveFMAFusion(VT);

// Is the node an FMUL and contractable either due to global flags or		// Is the node an FMUL and contractable either due to global flags or
// SDNodeFlags.		// SDNodeFlags.
auto isContractableFMUL = [AllowFusionGlobally](SDValue N) {		auto isContractableFMUL = [AllowFusionGlobally, &matcher](SDValue N) {
if (N.getOpcode() != ISD::FMUL)		if (!matcher.match(N, ISD::FMUL))
return false;		return false;
return AllowFusionGlobally \|\| isContractable(N.getNode());		return AllowFusionGlobally \|\| isContractable(N.getNode());
};		};
// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),		// If we have two choices trying to fold (fadd (fmul u, v), (fmul x, y)),
// prefer to fold the multiply with fewer uses.		// prefer to fold the multiply with fewer uses.
if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {		if (Aggressive && isContractableFMUL(N0) && isContractableFMUL(N1)) {
if (N0.getNode()->use_size() > N1.getNode()->use_size())		if (N0.getNode()->use_size() > N1.getNode()->use_size())
std::swap(N0, N1);		std::swap(N0, N1);
}		}

// fold (fadd (fmul x, y), z) -> (fma x, y, z)		// fold (fadd (fmul x, y), z) -> (fma x, y, z)
if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {		if (isContractableFMUL(N0) && (Aggressive \|\| N0->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1), N1, Flags);		N0.getOperand(0), N0.getOperand(1), N1, Flags);
}		}

// fold (fadd x, (fmul y, z)) -> (fma y, z, x)		// fold (fadd x, (fmul y, z)) -> (fma y, z, x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {		if (isContractableFMUL(N1) && (Aggressive \|\| N1->hasOneUse())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1), N0, Flags);		N1.getOperand(0), N1.getOperand(1), N0, Flags);
}		}

// Look through FP_EXTEND nodes to do more combining.		// Look through FP_EXTEND nodes to do more combining.

// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)		// fold (fadd (fpext (fmul x, y)), z) -> (fma (fpext x), (fpext y), z)
if (N0.getOpcode() == ISD::FP_EXTEND) {		if ((N0.getOpcode() == ISD::FP_EXTEND) && matcher.isCompatible(N0.getOperand(0))) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (isContractableFMUL(N00) &&		if (isContractableFMUL(N00) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(0)),		N00.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N00.getOperand(1)), N1, Flags);		N00.getOperand(1)), N1, Flags);
}		}
}		}

// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)		// fold (fadd x, (fpext (fmul y, z))) -> (fma (fpext y), (fpext z), x)
// Note: Commutes FADD operands.		// Note: Commutes FADD operands.
if (N1.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N1, ISD::FP_EXTEND)) {
SDValue N10 = N1.getOperand(0);		SDValue N10 = N1.getOperand(0);
if (isContractableFMUL(N10) &&		if (isContractableFMUL(N10) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N10.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N10.getValueType())) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(0)),		N10.getOperand(0)),
DAG.getNode(ISD::FP_EXTEND, SL, VT,		matcher.getNode(ISD::FP_EXTEND, SL, VT,
N10.getOperand(1)), N0, Flags);		N10.getOperand(1)), N0, Flags);
}		}
}		}

// More folding opportunities when target permits.		// More folding opportunities when target permits.
if (Aggressive) {		if (Aggressive) {
// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))		// fold (fadd (fma x, y, (fmul u, v)), z) -> (fma x, y (fma u, v, z))
if (CanFuse &&		if (CanFuse &&
N0.getOpcode() == PreferredFusedOpcode &&		matcher.match(N0, PreferredFusedOpcode) &&
N0.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N0.getOperand(2), ISD::FMUL) &&
N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {		N0->hasOneUse() && N0.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(0), N0.getOperand(1),		N0.getOperand(0), N0.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N0.getOperand(2).getOperand(0),		N0.getOperand(2).getOperand(0),
N0.getOperand(2).getOperand(1),		N0.getOperand(2).getOperand(1),
N1, Flags), Flags);		N1, Flags), Flags);
}		}

// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))		// fold (fadd x, (fma y, z, (fmul u, v)) -> (fma y, z (fma u, v, x))
if (CanFuse &&		if (CanFuse &&
N1->getOpcode() == PreferredFusedOpcode &&		matcher.match(N1, PreferredFusedOpcode) &&
N1.getOperand(2).getOpcode() == ISD::FMUL &&		matcher.match(N1.getOperand(2), ISD::FMUL) &&
N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {		N1->hasOneUse() && N1.getOperand(2)->hasOneUse()) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(0), N1.getOperand(1),		N1.getOperand(0), N1.getOperand(1),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
N1.getOperand(2).getOperand(0),		N1.getOperand(2).getOperand(0),
N1.getOperand(2).getOperand(1),		N1.getOperand(2).getOperand(1),
N0, Flags), Flags);		N0, Flags), Flags);
}		}


// fold (fadd (fma x, y, (fpext (fmul u, v))), z)		// fold (fadd (fma x, y, (fpext (fmul u, v))), z)
// -> (fma x, y, (fma (fpext u), (fpext v), z))		// -> (fma x, y, (fma (fpext u), (fpext v), z))
auto FoldFAddFMAFPExtFMul = [&] (		auto FoldFAddFMAFPExtFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT, X, Y,		return matcher.getNode(PreferredFusedOpcode, SL, VT, X, Y,
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == PreferredFusedOpcode) {		if (matcher.match(N0, PreferredFusedOpcode)) {
SDValue N02 = N0.getOperand(2);		SDValue N02 = N0.getOperand(2);
if (N02.getOpcode() == ISD::FP_EXTEND) {		if (matcher.match(N02, ISD::FP_EXTEND)) {
SDValue N020 = N02.getOperand(0);		SDValue N020 = N02.getOperand(0);
if (isContractableFMUL(N020) &&		if (isContractableFMUL(N020) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N020.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N020.getValueType())) {
return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),		return FoldFAddFMAFPExtFMul(N0.getOperand(0), N0.getOperand(1),
N020.getOperand(0), N020.getOperand(1),		N020.getOperand(0), N020.getOperand(1),
N1, Flags);		N1, Flags);
}		}
}		}
}		}

// fold (fadd (fpext (fma x, y, (fmul u, v))), z)		// fold (fadd (fpext (fma x, y, (fmul u, v))), z)
// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))		// -> (fma (fpext x), (fpext y), (fma (fpext u), (fpext v), z))
// FIXME: This turns two single-precision and one double-precision		// FIXME: This turns two single-precision and one double-precision
// operation into two double-precision operations, which might not be		// operation into two double-precision operations, which might not be
// interesting for all targets, especially GPUs.		// interesting for all targets, especially GPUs.
auto FoldFAddFPExtFMAFMul = [&] (		auto FoldFAddFPExtFMAFMul = [&] (
SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,		SDValue X, SDValue Y, SDValue U, SDValue V, SDValue Z,
SDNodeFlags Flags) {		SDNodeFlags Flags) {
return DAG.getNode(PreferredFusedOpcode, SL, VT,		return matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, X),		matcher.getNode(ISD::FP_EXTEND, SL, VT, X),
DAG.getNode(ISD::FP_EXTEND, SL, VT, Y),		matcher.getNode(ISD::FP_EXTEND, SL, VT, Y),
DAG.getNode(PreferredFusedOpcode, SL, VT,		matcher.getNode(PreferredFusedOpcode, SL, VT,
DAG.getNode(ISD::FP_EXTEND, SL, VT, U),		matcher.getNode(ISD::FP_EXTEND, SL, VT, U),
DAG.getNode(ISD::FP_EXTEND, SL, VT, V),		matcher.getNode(ISD::FP_EXTEND, SL, VT, V),
Z, Flags), Flags);		Z, Flags), Flags);
};		};
if (N0.getOpcode() == ISD::FP_EXTEND) {		if (N0.getOpcode() == ISD::FP_EXTEND) {
SDValue N00 = N0.getOperand(0);		SDValue N00 = N0.getOperand(0);
if (N00.getOpcode() == PreferredFusedOpcode) {		if (N00.getOpcode() == PreferredFusedOpcode) {
SDValue N002 = N00.getOperand(2);		SDValue N002 = N00.getOperand(2);
if (isContractableFMUL(N002) &&		if (isContractableFMUL(N002) &&
TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {		TLI.isFPExtFoldable(PreferredFusedOpcode, VT, N00.getValueType())) {
▲ Show 20 Lines • Show All 420 Lines • ▼ Show 20 Lines	SDValue DAGCombiner::visitFMULForFMADistributiveCombine(SDNode *N) {
if (SDValue FMA = FuseFSUB(N0, N1, Flags))		if (SDValue FMA = FuseFSUB(N0, N1, Flags))
return FMA;		return FMA;
if (SDValue FMA = FuseFSUB(N1, N0, Flags))		if (SDValue FMA = FuseFSUB(N1, N0, Flags))
return FMA;		return FMA;

return SDValue();		return SDValue();
}		}

		SDValue DAGCombiner::visitFADD_EVL(SDNode *N) {
		// FADD -> FMA combines:
		if (SDValue Fused = visitFADDForFMACombine<EVLMatchContext>(N)) {
		AddToWorklist(Fused.getNode());
		return Fused;
		}
		return SDValue();
		}

SDValue DAGCombiner::visitFADD(SDNode *N) {		SDValue DAGCombiner::visitFADD(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);		SDValue N1 = N->getOperand(1);
bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);		bool N0CFP = isConstantFPBuildVectorOrConstantFP(N0);
bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);		bool N1CFP = isConstantFPBuildVectorOrConstantFP(N1);
EVT VT = N->getValueType(0);		EVT VT = N->getValueType(0);
SDLoc DL(N);		SDLoc DL(N);
const TargetOptions &Options = DAG.getTarget().Options;		const TargetOptions &Options = DAG.getTarget().Options;
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	if (TLI.isOperationLegalOrCustom(ISD::FMUL, VT) && !N0CFP && !N1CFP) {
N0.getOperand(0) == N1.getOperand(0)) {		N0.getOperand(0) == N1.getOperand(0)) {
return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),		return DAG.getNode(ISD::FMUL, DL, VT, N0.getOperand(0),
DAG.getConstantFP(4.0, DL, VT), Flags);		DAG.getConstantFP(4.0, DL, VT), Flags);
}		}
}		}
} // enable-unsafe-fp-math		} // enable-unsafe-fp-math

// FADD -> FMA combines:		// FADD -> FMA combines:
if (SDValue Fused = visitFADDForFMACombine(N)) {		if (SDValue Fused = visitFADDForFMACombine<EmptyMatchContext>(N)) {
AddToWorklist(Fused.getNode());		AddToWorklist(Fused.getNode());
return Fused;		return Fused;
}		}
return SDValue();		return SDValue();
}		}

SDValue DAGCombiner::visitFSUB(SDNode *N) {		SDValue DAGCombiner::visitFSUB(SDNode *N) {
SDValue N0 = N->getOperand(0);		SDValue N0 = N->getOperand(0);
▲ Show 20 Lines • Show All 6,492 Lines • ▼ Show 20 Lines	auto ScaleShuffleMask = [](ArrayRef<int> Mask, int Scale) {
return SmallVector<int, 8>(Mask.begin(), Mask.end());		return SmallVector<int, 8>(Mask.begin(), Mask.end());

SmallVector<int, 8> NewMask;		SmallVector<int, 8> NewMask;
for (int M : Mask)		for (int M : Mask)
for (int s = 0; s != Scale; ++s)		for (int s = 0; s != Scale; ++s)
NewMask.push_back(M < 0 ? -1 : Scale * M + s);		NewMask.push_back(M < 0 ? -1 : Scale * M + s);
return NewMask;		return NewMask;
};		};

SDValue BC0 = peekThroughOneUseBitcasts(N0);		SDValue BC0 = peekThroughOneUseBitcasts(N0);
if (BC0.getOpcode() == ISD::VECTOR_SHUFFLE && BC0.hasOneUse()) {		if (BC0.getOpcode() == ISD::VECTOR_SHUFFLE && BC0.hasOneUse()) {
EVT SVT = VT.getScalarType();		EVT SVT = VT.getScalarType();
EVT InnerVT = BC0->getValueType(0);		EVT InnerVT = BC0->getValueType(0);
EVT InnerSVT = InnerVT.getScalarType();		EVT InnerSVT = InnerVT.getScalarType();

// Determine which shuffle works with the smaller scalar type.		// Determine which shuffle works with the smaller scalar type.
EVT ScaleVT = SVT.bitsLT(InnerSVT) ? VT : InnerVT;		EVT ScaleVT = SVT.bitsLT(InnerSVT) ? VT : InnerVT;
▲ Show 20 Lines • Show All 1,703 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

Show First 20 Lines • Show All 879 Lines • ▼ Show 20 Lines	SDValue DAGTypeLegalizer::PromoteIntRes_UADDSUBO(SDNode *N, unsigned ResNo) {

// Use the calculated overflow everywhere.		// Use the calculated overflow everywhere.
ReplaceValueWith(SDValue(N, 1), Ofl);		ReplaceValueWith(SDValue(N, 1), Ofl);

return Res;		return Res;
}		}

// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that		// Handle promotion for the ADDE/SUBE/ADDCARRY/SUBCARRY nodes. Notice that
// the third operand of ADDE/SUBE nodes is carry flag, which differs from		// the third operand of ADDE/SUBE nodes is carry flag, which differs from
// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.		// the ADDCARRY/SUBCARRY nodes in that the third operand is carry Boolean.
SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {		SDValue DAGTypeLegalizer::PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo) {
if (ResNo == 1)		if (ResNo == 1)
return PromoteIntRes_Overflow(N);		return PromoteIntRes_Overflow(N);

// We need to sign-extend the operands so the carry value computed by the		// We need to sign-extend the operands so the carry value computed by the
// wide operation will be equivalent to the carry value computed by the		// wide operation will be equivalent to the carry value computed by the
// narrow operation.		// narrow operation.
▲ Show 20 Lines • Show All 129 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "Promote integer operand: "; N->dump(&DAG);
dbgs() << "\n");		dbgs() << "\n");
SDValue Res = SDValue();		SDValue Res = SDValue();

if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {		if (CustomLowerNode(N, N->getOperand(OpNo).getValueType(), false)) {
LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");		LLVM_DEBUG(dbgs() << "Node has been custom lowered, done\n");
return false;		return false;
}		}

		if (N->isEVL()) {
		Res = PromoteIntOp_EVL(N, OpNo);
		} else {
switch (N->getOpcode()) {		switch (N->getOpcode()) {
default:		default:
#ifndef NDEBUG		#ifndef NDEBUG
dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";		dbgs() << "PromoteIntegerOperand Op #" << OpNo << ": ";
N->dump(&DAG); dbgs() << "\n";		N->dump(&DAG); dbgs() << "\n";
#endif		#endif
llvm_unreachable("Do not know how to promote this operator's operand!");		llvm_unreachable("Do not know how to promote this operator's operand!");

▲ Show 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	bool DAGTypeLegalizer::PromoteIntegerOperand(SDNode *N, unsigned OpNo) {

case ISD::FRAMEADDR:		case ISD::FRAMEADDR:
case ISD::RETURNADDR: Res = PromoteIntOp_FRAMERETURNADDR(N); break;		case ISD::RETURNADDR: Res = PromoteIntOp_FRAMERETURNADDR(N); break;

case ISD::PREFETCH: Res = PromoteIntOp_PREFETCH(N, OpNo); break;		case ISD::PREFETCH: Res = PromoteIntOp_PREFETCH(N, OpNo); break;

case ISD::SMULFIX: Res = PromoteIntOp_SMULFIX(N); break;		case ISD::SMULFIX: Res = PromoteIntOp_SMULFIX(N); break;
}		}
		}

// If the result is null, the sub-method took care of registering results etc.		// If the result is null, the sub-method took care of registering results etc.
if (!Res.getNode()) return false;		if (!Res.getNode()) return false;

// If the result is N, the sub-method updated N in place. Tell the legalizer		// If the result is N, the sub-method updated N in place. Tell the legalizer
// core about this.		// core about this.
if (Res.getNode() == N)		if (Res.getNode() == N)
return true;		return true;
▲ Show 20 Lines • Show All 257 Lines • ▼ Show 20 Lines	if (OpNo == 3) {
TruncateStore = true;		TruncateStore = true;
}		}

return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,		return DAG.getMaskedStore(N->getChain(), dl, DataOp, N->getBasePtr(), Mask,
N->getMemoryVT(), N->getMemOperand(),		N->getMemoryVT(), N->getMemOperand(),
TruncateStore, N->isCompressingStore());		TruncateStore, N->isCompressingStore());
}		}

		SDValue DAGTypeLegalizer::PromoteIntOp_EVL(SDNode *N, unsigned OpNo) {
		EVT DataVT;
		switch (N->getOpcode()) {
		default:
		DataVT = N->getValueType(0);
		break;

		case ISD::EVL_STORE:
		case ISD::EVL_SCATTER:
		llvm_unreachable("TODO implement EVL memory nodes");
		}

		// TODO assert that \p OpNo is the mask
		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
		NewOps[OpNo] = Mask;
		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
		}

SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,		SDValue DAGTypeLegalizer::PromoteIntOp_MLOAD(MaskedLoadSDNode *N,
unsigned OpNo) {		unsigned OpNo) {
assert(OpNo == 2 && "Only know how to promote the mask!");		assert(OpNo == 2 && "Only know how to promote the mask!");
EVT DataVT = N->getValueType(0);		EVT DataVT = N->getValueType(0);
SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);		SDValue Mask = PromoteTargetBoolean(N->getOperand(OpNo), DataVT);
SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());		SmallVector<SDValue, 4> NewOps(N->op_begin(), N->op_end());
NewOps[OpNo] = Mask;		NewOps[OpNo] = Mask;
return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);		return SDValue(DAG.UpdateNodeOperands(N, NewOps), 0);
▲ Show 20 Lines • Show All 2,486 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/LegalizeTypes.h

Show First 20 Lines • Show All 341 Lines • ▼ Show 20 Lines	private:
SDValue PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_ADDSUBCARRY(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_UNDEF(SDNode *N);		SDValue PromoteIntRes_UNDEF(SDNode *N);
SDValue PromoteIntRes_VAARG(SDNode *N);		SDValue PromoteIntRes_VAARG(SDNode *N);
SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);		SDValue PromoteIntRes_XMULO(SDNode *N, unsigned ResNo);
SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);		SDValue PromoteIntRes_ADDSUBSAT(SDNode *N);
SDValue PromoteIntRes_SMULFIX(SDNode *N);		SDValue PromoteIntRes_SMULFIX(SDNode *N);
SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);		SDValue PromoteIntRes_FLT_ROUNDS(SDNode *N);


// Integer Operand Promotion.		// Integer Operand Promotion.
bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);		bool PromoteIntegerOperand(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);		SDValue PromoteIntOp_ANY_EXTEND(SDNode *N);
SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);		SDValue PromoteIntOp_ATOMIC_STORE(AtomicSDNode *N);
SDValue PromoteIntOp_BITCAST(SDNode *N);		SDValue PromoteIntOp_BITCAST(SDNode *N);
SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);		SDValue PromoteIntOp_BUILD_PAIR(SDNode *N);
SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BR_CC(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_BRCOND(SDNode *N, unsigned OpNo);
Show All 16 Lines	private:
SDValue PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MSTORE(MaskedStoreSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MLOAD(MaskedLoadSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MLOAD(MaskedLoadSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MSCATTER(MaskedScatterSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);		SDValue PromoteIntOp_MGATHER(MaskedGatherSDNode *N, unsigned OpNo);
SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_ADDSUBCARRY(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);		SDValue PromoteIntOp_FRAMERETURNADDR(SDNode *N);
SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);		SDValue PromoteIntOp_PREFETCH(SDNode *N, unsigned OpNo);
SDValue PromoteIntOp_SMULFIX(SDNode *N);		SDValue PromoteIntOp_SMULFIX(SDNode *N);
		SDValue PromoteIntOp_EVL(SDNode *N, unsigned OpNo);

void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);		void PromoteSetCCOperands(SDValue &LHS,SDValue &RHS, ISD::CondCode Code);

//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//
// Integer Expansion Support: LegalizeIntegerTypes.cpp		// Integer Expansion Support: LegalizeIntegerTypes.cpp
//===--------------------------------------------------------------------===//		//===--------------------------------------------------------------------===//

/// Given a processed operand Op which was expanded into two integers of half		/// Given a processed operand Op which was expanded into two integers of half
▲ Show 20 Lines • Show All 558 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines	if (IsInteger) {
case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE		case ISD::SETOGT: Result = ISD::SETUGT ; break; // SETUGT & SETNE
}		}
}		}

return Result;		return Result;
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		// SDNode EVL Support
		//===----------------------------------------------------------------------===//

		int
		ISD::GetMaskPosEVL(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case ISD::EVL_FNEG:
		return 1;

		case ISD::EVL_ADD:
		case ISD::EVL_SUB:
		case ISD::EVL_MUL:
		case ISD::EVL_SDIV:
		case ISD::EVL_SREM:
		case ISD::EVL_UDIV:
		case ISD::EVL_UREM:

		case ISD::EVL_AND:
		case ISD::EVL_OR:
		case ISD::EVL_XOR:
		case ISD::EVL_SHL:
		case ISD::EVL_SRA:
		case ISD::EVL_SRL:
		case ISD::EVL_FDIV:
		case ISD::EVL_FREM:

		case ISD::EVL_FADD:
		case ISD::EVL_FMUL:
		return 2;

		case ISD::EVL_FMA:
		case ISD::EVL_SELECT:
		return 3;

		case EVL_REDUCE_FADD:
		case EVL_REDUCE_FMUL:
		case EVL_REDUCE_ADD:
		case EVL_REDUCE_MUL:
		case EVL_REDUCE_AND:
		case EVL_REDUCE_OR:
		case EVL_REDUCE_XOR:
		case EVL_REDUCE_SMAX:
		case EVL_REDUCE_SMIN:
		case EVL_REDUCE_UMAX:
		case EVL_REDUCE_UMIN:
		case VECREDUCE_FMAX:
		case VECREDUCE_FMIN:
		case EVL_REDUCE_FMAX:
		case EVL_REDUCE_FMIN:
		return 1;

		/// FMIN/FMAX nodes can have flags, for NaN/NoNaN variants.
		// (implicit) case ISD::EVL_COMPOSE: return -1
		}
		}

		int
		ISD::GetVectorLengthPosEVL(unsigned OpCode) {
		switch (OpCode) {
		default: return -1;

		case ISD::EVL_SELECT:
		return 0;

		case ISD::EVL_FNEG:
		return 2;

		case ISD::EVL_ADD:
		case ISD::EVL_SUB:
		case ISD::EVL_MUL:
		case ISD::EVL_SDIV:
		case ISD::EVL_SREM:
		case ISD::EVL_UDIV:
		case ISD::EVL_UREM:

		case ISD::EVL_AND:
		case ISD::EVL_OR:
		case ISD::EVL_XOR:
		case ISD::EVL_SHL:
		case ISD::EVL_SRA:
		case ISD::EVL_SRL:

		case ISD::EVL_FADD:
		case ISD::EVL_FMUL:
		case ISD::EVL_FDIV:
		case ISD::EVL_FREM:
		return 3;

		case ISD::EVL_FMA:
		return 4;

		case ISD::EVL_COMPOSE:
		return 3;

		case EVL_REDUCE_FADD:
		case EVL_REDUCE_FMUL:
		case EVL_REDUCE_ADD:
		case EVL_REDUCE_MUL:
		case EVL_REDUCE_AND:
		case EVL_REDUCE_OR:
		case EVL_REDUCE_XOR:
		case EVL_REDUCE_SMAX:
		case EVL_REDUCE_SMIN:
		case EVL_REDUCE_UMAX:
		case EVL_REDUCE_UMIN:
		case EVL_REDUCE_FMAX:
		case EVL_REDUCE_FMIN:
		return 2;
		}
		}

		unsigned
		ISD::GetFunctionOpCodeForEVL(unsigned OpCode) {
		switch (OpCode) {
		default: return OpCode;

		case ISD::EVL_SELECT: return ISD::VSELECT;
		case ISD::EVL_FNEG: return ISD::FNEG;
		case ISD::EVL_ADD: return ISD::ADD;
		case ISD::EVL_SUB: return ISD::SUB;
		case ISD::EVL_MUL: return ISD::MUL;
		case ISD::EVL_SDIV: return ISD::SDIV;
		case ISD::EVL_SREM: return ISD::SREM;
		case ISD::EVL_UDIV: return ISD::UDIV;
		case ISD::EVL_UREM: return ISD::UREM;

		case ISD::EVL_AND: return ISD::AND;
		case ISD::EVL_OR: return ISD::OR;
		case ISD::EVL_XOR: return ISD::XOR;
		case ISD::EVL_SHL: return ISD::SHL;
		case ISD::EVL_SRA: return ISD::SRA;
		case ISD::EVL_SRL: return ISD::SRL;
		case ISD::EVL_FDIV: return ISD::FDIV;
		case ISD::EVL_FREM: return ISD::FREM;

		case ISD::EVL_FADD: return ISD::FADD;
		case ISD::EVL_FMUL: return ISD::FMUL;

		case ISD::EVL_FMA: return ISD::FMA;
		}
		}

		unsigned
		ISD::GetEVLForFunctionOpCode(unsigned OpCode) {
		switch (OpCode) {
		default: llvm_unreachable("can not translate this Opcode to EVL");

		case ISD::VSELECT:return ISD::EVL_SELECT;
		case ISD::FNEG: return ISD::EVL_FNEG;
		case ISD::ADD: return ISD::EVL_ADD;
		case ISD::SUB: return ISD::EVL_SUB;
		case ISD::MUL: return ISD::EVL_MUL;
		case ISD::SDIV: return ISD::EVL_SDIV;
		case ISD::SREM: return ISD::EVL_SREM;
		case ISD::UDIV: return ISD::EVL_UDIV;
		case ISD::UREM: return ISD::EVL_UREM;

		case ISD::AND: return ISD::EVL_AND;
		case ISD::OR: return ISD::EVL_OR;
		case ISD::XOR: return ISD::EVL_XOR;
		case ISD::SHL: return ISD::EVL_SHL;
		case ISD::SRA: return ISD::EVL_SRA;
		case ISD::SRL: return ISD::EVL_SRL;
		case ISD::FDIV: return ISD::EVL_FDIV;
		case ISD::FREM: return ISD::EVL_FREM;

		case ISD::FADD: return ISD::EVL_FADD;
		case ISD::FMUL: return ISD::EVL_FMUL;

		case ISD::FMA: return ISD::EVL_FMA;
		}
		}


		//===----------------------------------------------------------------------===//
// SDNode Profile Support		// SDNode Profile Support
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// AddNodeIDOpcode - Add the node opcode to the NodeID data.		/// AddNodeIDOpcode - Add the node opcode to the NodeID data.
static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {		static void AddNodeIDOpcode(FoldingSetNodeID &ID, unsigned OpC) {
ID.AddInteger(OpC);		ID.AddInteger(OpC);
}		}

▲ Show 20 Lines • Show All 106 Lines • ▼ Show 20 Lines	static void AddNodeIDCustom(FoldingSetNodeID &ID, const SDNode *N) {
}		}
case ISD::STORE: {		case ISD::STORE: {
const StoreSDNode *ST = cast<StoreSDNode>(N);		const StoreSDNode *ST = cast<StoreSDNode>(N);
ID.AddInteger(ST->getMemoryVT().getRawBits());		ID.AddInteger(ST->getMemoryVT().getRawBits());
ID.AddInteger(ST->getRawSubclassData());		ID.AddInteger(ST->getRawSubclassData());
ID.AddInteger(ST->getPointerInfo().getAddrSpace());		ID.AddInteger(ST->getPointerInfo().getAddrSpace());
break;		break;
}		}
		case ISD::EVL_LOAD: {
		const EVLLoadSDNode *ELD = cast<EVLLoadSDNode>(N);
		ID.AddInteger(ELD->getMemoryVT().getRawBits());
		ID.AddInteger(ELD->getRawSubclassData());
		ID.AddInteger(ELD->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::EVL_STORE: {
		const EVLStoreSDNode *EST = cast<EVLStoreSDNode>(N);
		ID.AddInteger(EST->getMemoryVT().getRawBits());
		ID.AddInteger(EST->getRawSubclassData());
		ID.AddInteger(EST->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::EVL_GATHER: {
		const EVLGatherSDNode *EG = cast<EVLGatherSDNode>(N);
		ID.AddInteger(EG->getMemoryVT().getRawBits());
		ID.AddInteger(EG->getRawSubclassData());
		ID.AddInteger(EG->getPointerInfo().getAddrSpace());
		break;
		}
		case ISD::EVL_SCATTER: {
		const EVLScatterSDNode *ES = cast<EVLScatterSDNode>(N);
		ID.AddInteger(ES->getMemoryVT().getRawBits());
		ID.AddInteger(ES->getRawSubclassData());
		ID.AddInteger(ES->getPointerInfo().getAddrSpace());
		break;
		}
case ISD::MLOAD: {		case ISD::MLOAD: {
const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);		const MaskedLoadSDNode *MLD = cast<MaskedLoadSDNode>(N);
ID.AddInteger(MLD->getMemoryVT().getRawBits());		ID.AddInteger(MLD->getMemoryVT().getRawBits());
ID.AddInteger(MLD->getRawSubclassData());		ID.AddInteger(MLD->getRawSubclassData());
ID.AddInteger(MLD->getPointerInfo().getAddrSpace());		ID.AddInteger(MLD->getPointerInfo().getAddrSpace());
break;		break;
}		}
case ISD::MSTORE: {		case ISD::MSTORE: {
▲ Show 20 Lines • Show All 6,297 Lines • ▼ Show 20 Lines	SDValue SelectionDAG::getIndexedStore(SDValue OrigStore, const SDLoc &dl,

CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
SDValue V(N, 0);		SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);		NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;		return V;
}		}

		SDValue SelectionDAG::getLoadEVL(EVT VT, const SDLoc &dl, SDValue Chain,
		SDValue Ptr, SDValue Mask, SDValue VLen,
		EVT MemVT, MachineMemOperand *MMO,
		ISD::LoadExtType ExtTy) {
		SDVTList VTs = getVTList(VT, MVT::Other);
		SDValue Ops[] = { Chain, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::EVL_LOAD, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<EVLLoadSDNode>(
		dl.getIROrder(), VTs, ExtTy, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<EVLLoadSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<EVLLoadSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		ExtTy, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,		SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,
SDValue Ptr, SDValue Mask, SDValue PassThru,		SDValue Ptr, SDValue Mask, SDValue PassThru,
EVT MemVT, MachineMemOperand *MMO,		EVT MemVT, MachineMemOperand *MMO,
ISD::LoadExtType ExtTy, bool isExpanding) {		ISD::LoadExtType ExtTy, bool isExpanding) {
SDVTList VTs = getVTList(VT, MVT::Other);		SDVTList VTs = getVTList(VT, MVT::Other);
SDValue Ops[] = { Chain, Ptr, Mask, PassThru };		SDValue Ops[] = { Chain, Ptr, Mask, PassThru };
FoldingSetNodeID ID;		FoldingSetNodeID ID;
AddNodeIDNode(ID, ISD::MLOAD, VTs, Ops);		AddNodeIDNode(ID, ISD::MLOAD, VTs, Ops);
Show All 12 Lines	SDValue SelectionDAG::getMaskedLoad(EVT VT, const SDLoc &dl, SDValue Chain,

CSEMap.InsertNode(N, IP);		CSEMap.InsertNode(N, IP);
InsertNode(N);		InsertNode(N);
SDValue V(N, 0);		SDValue V(N, 0);
NewSDValueDbgMsg(V, "Creating new node: ", this);		NewSDValueDbgMsg(V, "Creating new node: ", this);
return V;		return V;
}		}

		SDValue SelectionDAG::getStoreEVL(SDValue Chain, const SDLoc &dl,
		SDValue Val, SDValue Ptr, SDValue Mask,
		SDValue VLen, EVT MemVT, MachineMemOperand *MMO,
		bool IsTruncating) {
		assert(Chain.getValueType() == MVT::Other &&
		"Invalid chain type");
		EVT VT = Val.getValueType();
		SDVTList VTs = getVTList(MVT::Other);
		SDValue Ops[] = { Chain, Val, Ptr, Mask, VLen };
		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::EVL_STORE, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<EVLStoreSDNode>(
		dl.getIROrder(), VTs, IsTruncating, MemVT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<EVLStoreSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<EVLStoreSDNode>(dl.getIROrder(), dl.getDebugLoc(), VTs,
		IsTruncating, MemVT, MMO);
		createOperands(N, Ops);

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getGatherEVL(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO) {
		assert(Ops.size() == 6 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::EVL_GATHER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedGatherSDNode>(
		dl.getIROrder(), VTs, VT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<EVLGatherSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}

		auto *N = newSDNode<EVLGatherSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValueType(0).getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}

		SDValue SelectionDAG::getScatterEVL(SDVTList VTs, EVT VT, const SDLoc &dl,
		ArrayRef<SDValue> Ops,
		MachineMemOperand *MMO) {
		assert(Ops.size() == 7 && "Incompatible number of operands");

		FoldingSetNodeID ID;
		AddNodeIDNode(ID, ISD::EVL_SCATTER, VTs, Ops);
		ID.AddInteger(VT.getRawBits());
		ID.AddInteger(getSyntheticNodeSubclassData<MaskedScatterSDNode>(
		dl.getIROrder(), VTs, VT, MMO));
		ID.AddInteger(MMO->getPointerInfo().getAddrSpace());
		void *IP = nullptr;
		if (SDNode *E = FindNodeOrInsertPos(ID, dl, IP)) {
		cast<EVLScatterSDNode>(E)->refineAlignment(MMO);
		return SDValue(E, 0);
		}
		auto *N = newSDNode<EVLScatterSDNode>(dl.getIROrder(), dl.getDebugLoc(),
		VTs, VT, MMO);
		createOperands(N, Ops);

		assert(N->getMask().getValueType().getVectorNumElements() ==
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between mask and data");
		assert(N->getIndex().getValueType().getVectorNumElements() >=
		N->getValue().getValueType().getVectorNumElements() &&
		"Vector width mismatch between index and data");
		assert(isa<ConstantSDNode>(N->getScale()) &&
		cast<ConstantSDNode>(N->getScale())->getAPIntValue().isPowerOf2() &&
		"Scale should be a constant power of 2");

		CSEMap.InsertNode(N, IP);
		InsertNode(N);
		SDValue V(N, 0);
		NewSDValueDbgMsg(V, "Creating new node: ", this);
		return V;
		}
SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,		SDValue SelectionDAG::getMaskedStore(SDValue Chain, const SDLoc &dl,
SDValue Val, SDValue Ptr, SDValue Mask,		SDValue Val, SDValue Ptr, SDValue Mask,
EVT MemVT, MachineMemOperand *MMO,		EVT MemVT, MachineMemOperand *MMO,
bool IsTruncating, bool IsCompressing) {		bool IsTruncating, bool IsCompressing) {
assert(Chain.getValueType() == MVT::Other &&		assert(Chain.getValueType() == MVT::Other &&
"Invalid chain type");		"Invalid chain type");
EVT VT = Val.getValueType();		EVT VT = Val.getValueType();
SDVTList VTs = getVTList(MVT::Other);		SDVTList VTs = getVTList(MVT::Other);
▲ Show 20 Lines • Show All 2,442 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

Show First 20 Lines • Show All 928 Lines • ▼ Show 20 Lines	private:
void visitAtomicStore(const StoreInst &I);		void visitAtomicStore(const StoreInst &I);
void visitLoadFromSwiftError(const LoadInst &I);		void visitLoadFromSwiftError(const LoadInst &I);
void visitStoreToSwiftError(const StoreInst &I);		void visitStoreToSwiftError(const StoreInst &I);

void visitInlineAsm(ImmutableCallSite CS);		void visitInlineAsm(ImmutableCallSite CS);
const char *visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);		const char *visitIntrinsicCall(const CallInst &I, unsigned Intrinsic);
void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);		void visitTargetIntrinsic(const CallInst &I, unsigned Intrinsic);
void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);		void visitConstrainedFPIntrinsic(const ConstrainedFPIntrinsic &FPI);
		void visitExplicitVectorLengthIntrinsic(const EVLIntrinsic &EVLI);
		void visitCmpEVL(const EVLIntrinsic &I);
		void visitLoadEVL(const CallInst &I);
		void visitStoreEVL(const CallInst &I);
		void visitGatherEVL(const CallInst &I);
		void visitScatterEVL(const CallInst &I);

void visitVAStart(const CallInst &I);		void visitVAStart(const CallInst &I);
void visitVAArg(const VAArgInst &I);		void visitVAArg(const VAArgInst &I);
void visitVAEnd(const CallInst &I);		void visitVAEnd(const CallInst &I);
void visitVACopy(const CallInst &I);		void visitVACopy(const CallInst &I);
void visitStackmap(const CallInst &I);		void visitStackmap(const CallInst &I);
void visitPatchpoint(ImmutableCallSite CS,		void visitPatchpoint(ImmutableCallSite CS,
const BasicBlock *EHPadBB = nullptr);		const BasicBlock *EHPadBB = nullptr);
▲ Show 20 Lines • Show All 132 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 3,942 Lines • ▼ Show 20 Lines	getMachineMemOperand(MachinePointerInfo(PtrOperand),
Alignment, AAInfo);		Alignment, AAInfo);
SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,		SDValue StoreNode = DAG.getMaskedStore(getRoot(), sdl, Src0, Ptr, Mask, VT,
MMO, false /* Truncating */,		MMO, false /* Truncating */,
IsCompressing);		IsCompressing);
DAG.setRoot(StoreNode);		DAG.setRoot(StoreNode);
setValue(&I, StoreNode);		setValue(&I, StoreNode);
}		}

		void SelectionDAGBuilder::visitStoreEVL(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getEVLStoreOps = [&](Value* &Ptr, Value* &Mask, Value* &Src0,
		Value * &VLen) {
		// llvm.masked.store.*(Src0, Ptr, Mask, VLen)
		Src0 = I.getArgOperand(0);
		Ptr = I.getArgOperand(1);
		Mask = I.getArgOperand(2);
		VLen = I.getArgOperand(3);
		};

		Value PtrOperand, MaskOperand, Src0Operand, VLenOperand;
		getEVLStoreOps(PtrOperand, MaskOperand, Src0Operand, VLenOperand);

		unsigned Alignment = 0; // TODO infer alignment

		SDValue Ptr = getValue(PtrOperand);
		SDValue Src0 = getValue(Src0Operand);
		SDValue Mask = getValue(MaskOperand);
		SDValue VLen = getValue(VLenOperand);

		EVT VT = Src0.getValueType();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		SDValue StoreNode = DAG.getStoreEVL(getRoot(), sdl, Src0, Ptr, Mask, VLen, VT,
		MMO, false /* Truncating */);
		DAG.setRoot(StoreNode);
		setValue(&I, StoreNode);
		}

// Get a uniform base for the Gather/Scatter intrinsic.		// Get a uniform base for the Gather/Scatter intrinsic.
// The first argument of the Gather/Scatter intrinsic is a vector of pointers.		// The first argument of the Gather/Scatter intrinsic is a vector of pointers.
// We try to represent it as a base pointer + vector of indices.		// We try to represent it as a base pointer + vector of indices.
// Usually, the vector of pointers comes from a 'getelementptr' instruction.		// Usually, the vector of pointers comes from a 'getelementptr' instruction.
// The first operand of the GEP may be a single pointer or a vector of pointers		// The first operand of the GEP may be a single pointer or a vector of pointers
// Example:		// Example:
// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind		// %gep.ptr = getelementptr i32, <8 x i32*> %vptr, <8 x i32> %ind
// or		// or
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	SDValue Gather = DAG.getMaskedGather(DAG.getVTList(VT, MVT::Other), VT, sdl,
Ops, MMO);		Ops, MMO);

SDValue OutChain = Gather.getValue(1);		SDValue OutChain = Gather.getValue(1);
if (!ConstantMemory)		if (!ConstantMemory)
PendingLoads.push_back(OutChain);		PendingLoads.push_back(OutChain);
setValue(&I, Gather);		setValue(&I, Gather);
}		}

		void SelectionDAGBuilder::visitGatherEVL(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// @llvm.evl.gather.*(Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(0);
		SDValue Mask = getValue(I.getArgOperand(1));
		SDValue VLen = getValue(I.getArgOperand(2));

		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
		unsigned Alignment = 0; // TODO infer alignment //(cast<ConstantInt>(I.getArgOperand(1)))->getZExtValue();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		SDValue Root = DAG.getRoot();
		SDValue Base;
		SDValue Index;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, Scale, this);
		bool ConstantMemory = false;
		if (UniformBase && AA &&
		AA->pointsToConstantMemory(
		MemoryLocation(BasePtr,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo))) {
		// Do not serialize (non-volatile) loads of constant memory with anything.
		Root = DAG.getEntryNode();
		ConstantMemory = true;
		}

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(UniformBase ? BasePtr : nullptr),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { Root, Base, Index, Scale, Mask, VLen };
		SDValue Gather = DAG.getGatherEVL(DAG.getVTList(VT, MVT::Other), VT, sdl, Ops, MMO);

		SDValue OutChain = Gather.getValue(1);
		if (!ConstantMemory)
		PendingLoads.push_back(OutChain);
		setValue(&I, Gather);
		}

		void SelectionDAGBuilder::visitScatterEVL(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		// llvm.evl.scatter.*(Src0, Ptrs, Mask, VLen)
		const Value *Ptr = I.getArgOperand(1);
		SDValue Src0 = getValue(I.getArgOperand(0));
		SDValue Mask = getValue(I.getArgOperand(2));
		SDValue VLen = getValue(I.getArgOperand(3));
		EVT VT = Src0.getValueType();
		unsigned Alignment = 0; // TODO infer alignmen t(cast<ConstantInt>(I.getArgOperand(2)))->getZExtValue();
		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);

		SDValue Base;
		SDValue Index;
		SDValue Scale;
		const Value *BasePtr = Ptr;
		bool UniformBase = getUniformBase(BasePtr, Base, Index, Scale, this);

		const Value *MemOpBasePtr = UniformBase ? BasePtr : nullptr;
		MachineMemOperand *MMO = DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(MemOpBasePtr),
		MachineMemOperand::MOStore, VT.getStoreSize(),
		Alignment, AAInfo);
		if (!UniformBase) {
		Base = DAG.getConstant(0, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		Index = getValue(Ptr);
		Scale = DAG.getTargetConstant(1, sdl, TLI.getPointerTy(DAG.getDataLayout()));
		}
		SDValue Ops[] = { getRoot(), Src0, Base, Index, Scale, Mask, VLen };
		SDValue Scatter = DAG.getScatterEVL(DAG.getVTList(MVT::Other), VT, sdl,
		Ops, MMO);
		DAG.setRoot(Scatter);
		setValue(&I, Scatter);
		}

		void SelectionDAGBuilder::visitLoadEVL(const CallInst &I) {
		SDLoc sdl = getCurSDLoc();

		auto getMaskedLoadOps = [&](Value* &Ptr, Value* &Mask, Value* &VLen,
		unsigned& Alignment) {
		// @llvm.evl.load.*(Ptr, Mask, Vlen)
		Ptr = I.getArgOperand(0);
		Alignment = 0; // TODO infer alignment //Alignment = cast<ConstantInt>(I.getArgOperand(1))->getZExtValue();
		Mask = I.getArgOperand(1);
		VLen = I.getArgOperand(2);
		};

		Value PtrOperand, MaskOperand, *VLenOperand;
		unsigned Alignment;
		getMaskedLoadOps(PtrOperand, MaskOperand, VLenOperand, Alignment);

		SDValue Ptr = getValue(PtrOperand);
		SDValue VLen = getValue(VLenOperand);
		SDValue Mask = getValue(MaskOperand);

		// infer the return type
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		SmallVector<EVT, 4> ValValueVTs;
		ComputeValueVTs(TLI, DAG.getDataLayout(), I.getType(), ValValueVTs);
		EVT VT = ValValueVTs[0];
		assert((ValValueVTs.size() == 1) && "splitting not implemented");

		if (!Alignment)
		Alignment = DAG.getEVTAlignment(VT);

		AAMDNodes AAInfo;
		I.getAAMetadata(AAInfo);
		const MDNode *Ranges = I.getMetadata(LLVMContext::MD_range);

		// Do not serialize masked loads of constant memory with anything.
		bool AddToChain =
		!AA \|\| !AA->pointsToConstantMemory(MemoryLocation(
		PtrOperand,
		LocationSize::precise(
		DAG.getDataLayout().getTypeStoreSize(I.getType())),
		AAInfo));
		SDValue InChain = AddToChain ? DAG.getRoot() : DAG.getEntryNode();

		MachineMemOperand *MMO =
		DAG.getMachineFunction().
		getMachineMemOperand(MachinePointerInfo(PtrOperand),
		MachineMemOperand::MOLoad, VT.getStoreSize(),
		Alignment, AAInfo, Ranges);

		SDValue Load = DAG.getLoadEVL(VT, sdl, InChain, Ptr, Mask, VLen, VT, MMO,
		ISD::NON_EXTLOAD);
		if (AddToChain)
		PendingLoads.push_back(Load.getValue(1));
		setValue(&I, Load);
		}

void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {		void SelectionDAGBuilder::visitAtomicCmpXchg(const AtomicCmpXchgInst &I) {
SDLoc dl = getCurSDLoc();		SDLoc dl = getCurSDLoc();
AtomicOrdering SuccessOrder = I.getSuccessOrdering();		AtomicOrdering SuccessOrder = I.getSuccessOrdering();
AtomicOrdering FailureOrder = I.getFailureOrdering();		AtomicOrdering FailureOrder = I.getFailureOrdering();
SyncScope::ID SSID = I.getSyncScopeID();		SyncScope::ID SSID = I.getSyncScopeID();

SDValue InChain = getRoot();		SDValue InChain = getRoot();

▲ Show 20 Lines • Show All 1,480 Lines • ▼ Show 20 Lines	SelectionDAGBuilder::visitIntrinsicCall(const CallInst &I, unsigned Intrinsic) {
case Intrinsic::experimental_constrained_maxnum:		case Intrinsic::experimental_constrained_maxnum:
case Intrinsic::experimental_constrained_minnum:		case Intrinsic::experimental_constrained_minnum:
case Intrinsic::experimental_constrained_ceil:		case Intrinsic::experimental_constrained_ceil:
case Intrinsic::experimental_constrained_floor:		case Intrinsic::experimental_constrained_floor:
case Intrinsic::experimental_constrained_round:		case Intrinsic::experimental_constrained_round:
case Intrinsic::experimental_constrained_trunc:		case Intrinsic::experimental_constrained_trunc:
visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));		visitConstrainedFPIntrinsic(cast<ConstrainedFPIntrinsic>(I));
return nullptr;		return nullptr;

		case Intrinsic::evl_and:
		case Intrinsic::evl_or:
		case Intrinsic::evl_xor:
		case Intrinsic::evl_ashr:
		case Intrinsic::evl_lshr:
		case Intrinsic::evl_shl:

		case Intrinsic::evl_select:
		case Intrinsic::evl_compose:
		case Intrinsic::evl_compress:
		case Intrinsic::evl_expand:
		case Intrinsic::evl_vshift:

		case Intrinsic::evl_load:
		case Intrinsic::evl_store:
		case Intrinsic::evl_gather:
		case Intrinsic::evl_scatter:

		case Intrinsic::evl_fneg:

		case Intrinsic::evl_fadd:
		case Intrinsic::evl_fsub:
		case Intrinsic::evl_fmul:
		case Intrinsic::evl_fdiv:
		case Intrinsic::evl_frem:

		case Intrinsic::evl_fma:

		case Intrinsic::evl_add:
		case Intrinsic::evl_sub:
		case Intrinsic::evl_mul:
		case Intrinsic::evl_udiv:
		case Intrinsic::evl_sdiv:
		case Intrinsic::evl_urem:
		case Intrinsic::evl_srem:

		case Intrinsic::evl_cmp:

		case Intrinsic::evl_reduce_and:
		case Intrinsic::evl_reduce_or:
		case Intrinsic::evl_reduce_xor:

		case Intrinsic::evl_reduce_fadd:
		case Intrinsic::evl_reduce_fmax:
		case Intrinsic::evl_reduce_fmin:
		case Intrinsic::evl_reduce_fmul:

		case Intrinsic::evl_reduce_add:
		case Intrinsic::evl_reduce_mul:
		case Intrinsic::evl_reduce_umax:
		case Intrinsic::evl_reduce_umin:
		case Intrinsic::evl_reduce_smax:
		case Intrinsic::evl_reduce_smin:
		visitExplicitVectorLengthIntrinsic(cast<EVLIntrinsic>(I));
		return nullptr;

case Intrinsic::fmuladd: {		case Intrinsic::fmuladd: {
EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());		EVT VT = TLI.getValueType(DAG.getDataLayout(), I.getType());
if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&		if (TM.Options.AllowFPOpFusion != FPOpFusion::Strict &&
TLI.isFMAFasterThanFMulAndFAdd(VT)) {		TLI.isFMAFasterThanFMulAndFAdd(VT)) {
setValue(&I, DAG.getNode(ISD::FMA, sdl,		setValue(&I, DAG.getNode(ISD::FMA, sdl,
getValue(I.getArgOperand(0)).getValueType(),		getValue(I.getArgOperand(0)).getValueType(),
getValue(I.getArgOperand(0)),		getValue(I.getArgOperand(0)),
getValue(I.getArgOperand(1)),		getValue(I.getArgOperand(1)),
▲ Show 20 Lines • Show All 797 Lines • ▼ Show 20 Lines	void SelectionDAGBuilder::visitConstrainedFPIntrinsic(

assert(Result.getNode()->getNumValues() == 2);		assert(Result.getNode()->getNumValues() == 2);
SDValue OutChain = Result.getValue(1);		SDValue OutChain = Result.getValue(1);
DAG.setRoot(OutChain);		DAG.setRoot(OutChain);
SDValue FPResult = Result.getValue(0);		SDValue FPResult = Result.getValue(0);
setValue(&FPI, FPResult);		setValue(&FPI, FPResult);
}		}

		void SelectionDAGBuilder::visitCmpEVL(const EVLIntrinsic &I) {
		ISD::CondCode Condition;
		CmpInst::Predicate predicate = I.getCmpPredicate();
		bool IsFP = I.getOperand(0)->getType()->isFPOrFPVectorTy();
		if (IsFP) {
		Condition = getFCmpCondCode(predicate);
		auto *FPMO = dyn_cast<FPMathOperator>(&I);
		if ((FPMO && FPMO->hasNoNaNs()) \|\| TM.Options.NoNaNsFPMath)
		Condition = getFCmpCodeWithoutNaN(Condition);

		} else {
		Condition = getICmpCondCode(predicate);
		}

		SDValue Op1 = getValue(I.getOperand(0));
		SDValue Op2 = getValue(I.getOperand(1));

		EVT DestVT = DAG.getTargetLoweringInfo().getValueType(DAG.getDataLayout(),
		I.getType());
		setValue(&I, DAG.getSetCC(getCurSDLoc(), DestVT, Op1, Op2, Condition));
		}

		void SelectionDAGBuilder::visitExplicitVectorLengthIntrinsic(
		const EVLIntrinsic & EVLInst) {
		SDLoc sdl = getCurSDLoc();
		unsigned Opcode;
		switch (EVLInst.getIntrinsicID()) {
		default:
		llvm_unreachable("Unforeseen intrinsic"); // Can't reach here.

		case Intrinsic::evl_load: visitLoadEVL(EVLInst); return;
		case Intrinsic::evl_store: visitStoreEVL(EVLInst); return;
		case Intrinsic::evl_gather: visitGatherEVL(EVLInst); return;
		case Intrinsic::evl_scatter: visitScatterEVL(EVLInst); return;

		case Intrinsic::evl_cmp: visitCmpEVL(EVLInst); return;

		case Intrinsic::evl_add: Opcode = ISD::EVL_ADD; break;
		case Intrinsic::evl_sub: Opcode = ISD::EVL_SUB; break;
		case Intrinsic::evl_mul: Opcode = ISD::EVL_MUL; break;
		case Intrinsic::evl_udiv: Opcode = ISD::EVL_UDIV; break;
		case Intrinsic::evl_sdiv: Opcode = ISD::EVL_SDIV; break;
		case Intrinsic::evl_urem: Opcode = ISD::EVL_UREM; break;
		case Intrinsic::evl_srem: Opcode = ISD::EVL_SREM; break;

		case Intrinsic::evl_and: Opcode = ISD::EVL_AND; break;
		case Intrinsic::evl_or: Opcode = ISD::EVL_OR; break;
		case Intrinsic::evl_xor: Opcode = ISD::EVL_XOR; break;
		case Intrinsic::evl_ashr: Opcode = ISD::EVL_SRA; break;
		case Intrinsic::evl_lshr: Opcode = ISD::EVL_SRL; break;
		case Intrinsic::evl_shl: Opcode = ISD::EVL_SHL; break;

		case Intrinsic::evl_fneg: Opcode = ISD::EVL_FNEG; break;
		case Intrinsic::evl_fadd: Opcode = ISD::EVL_FADD; break;
		case Intrinsic::evl_fsub: Opcode = ISD::EVL_FSUB; break;
		case Intrinsic::evl_fmul: Opcode = ISD::EVL_FMUL; break;
		case Intrinsic::evl_fdiv: Opcode = ISD::EVL_FDIV; break;
		case Intrinsic::evl_frem: Opcode = ISD::EVL_FREM; break;

		case Intrinsic::evl_fma: Opcode = ISD::EVL_FMA; break;

		case Intrinsic::evl_select: Opcode = ISD::EVL_SELECT; break;
		case Intrinsic::evl_compose: Opcode = ISD::EVL_COMPOSE; break;
		case Intrinsic::evl_compress: Opcode = ISD::EVL_COMPRESS; break;
		case Intrinsic::evl_expand: Opcode = ISD::EVL_EXPAND; break;
		case Intrinsic::evl_vshift: Opcode = ISD::EVL_VSHIFT; break;

		case Intrinsic::evl_reduce_and: Opcode = ISD::EVL_REDUCE_AND; break;
		case Intrinsic::evl_reduce_or: Opcode = ISD::EVL_REDUCE_OR; break;
		case Intrinsic::evl_reduce_xor: Opcode = ISD::EVL_REDUCE_XOR; break;
		case Intrinsic::evl_reduce_add: Opcode = ISD::EVL_REDUCE_ADD; break;
		case Intrinsic::evl_reduce_mul: Opcode = ISD::EVL_REDUCE_MUL; break;
		case Intrinsic::evl_reduce_fadd: Opcode = ISD::EVL_REDUCE_FADD; break;
		case Intrinsic::evl_reduce_fmul: Opcode = ISD::EVL_REDUCE_FMUL; break;
		case Intrinsic::evl_reduce_smax: Opcode = ISD::EVL_REDUCE_SMAX; break;
		case Intrinsic::evl_reduce_smin: Opcode = ISD::EVL_REDUCE_SMIN; break;
		case Intrinsic::evl_reduce_umax: Opcode = ISD::EVL_REDUCE_UMAX; break;
		case Intrinsic::evl_reduce_umin: Opcode = ISD::EVL_REDUCE_UMIN; break;
		}

		// TODO memory evl: SDValue Chain = getRoot();

		SmallVector<EVT, 4> ValueVTs;
		const TargetLowering &TLI = DAG.getTargetLoweringInfo();
		ComputeValueVTs(TLI, DAG.getDataLayout(), EVLInst.getType(), ValueVTs);
		SDVTList VTs = DAG.getVTList(ValueVTs);

		// ValueVTs.push_back(MVT::Other); // Out chain


		SDValue Result;

		switch (EVLInst.getNumArgOperands()) {
		default:
		llvm_unreachable("unexpected number of arguments to evl intrinsic");
		case 3:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(EVLInst.getArgOperand(0)),
		getValue(EVLInst.getArgOperand(1)),
		getValue(EVLInst.getArgOperand(2)) });
		break;

		case 4:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(EVLInst.getArgOperand(0)),
		getValue(EVLInst.getArgOperand(1)),
		getValue(EVLInst.getArgOperand(2)),
		getValue(EVLInst.getArgOperand(3)) });
		break;

		case 5:
		Result = DAG.getNode(Opcode, sdl, VTs,
		{ getValue(EVLInst.getArgOperand(0)),
		getValue(EVLInst.getArgOperand(1)),
		getValue(EVLInst.getArgOperand(2)),
		getValue(EVLInst.getArgOperand(3)),
		getValue(EVLInst.getArgOperand(4)) });
		break;
		}

		if (Result.getNode()->getNumValues() == 2) {
		// this evl node has a chain
		SDValue OutChain = Result.getValue(1);
		DAG.setRoot(OutChain);
		SDValue EVLResult = Result.getValue(0);
		setValue(&EVLInst, EVLResult);
		} else {
		// this is a pure node
		setValue(&EVLInst, Result);
		}
		}

std::pair<SDValue, SDValue>		std::pair<SDValue, SDValue>
SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,		SelectionDAGBuilder::lowerInvokable(TargetLowering::CallLoweringInfo &CLI,
const BasicBlock *EHPadBB) {		const BasicBlock *EHPadBB) {
MachineFunction &MF = DAG.getMachineFunction();		MachineFunction &MF = DAG.getMachineFunction();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
MCSymbol *BeginLabel = nullptr;		MCSymbol *BeginLabel = nullptr;

if (EHPadBB) {		if (EHPadBB) {
▲ Show 20 Lines • Show All 3,917 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

Show First 20 Lines • Show All 415 Lines • ▼ Show 20 Lines	#endif
case ISD::VECREDUCE_OR: return "vecreduce_or";		case ISD::VECREDUCE_OR: return "vecreduce_or";
case ISD::VECREDUCE_XOR: return "vecreduce_xor";		case ISD::VECREDUCE_XOR: return "vecreduce_xor";
case ISD::VECREDUCE_SMAX: return "vecreduce_smax";		case ISD::VECREDUCE_SMAX: return "vecreduce_smax";
case ISD::VECREDUCE_SMIN: return "vecreduce_smin";		case ISD::VECREDUCE_SMIN: return "vecreduce_smin";
case ISD::VECREDUCE_UMAX: return "vecreduce_umax";		case ISD::VECREDUCE_UMAX: return "vecreduce_umax";
case ISD::VECREDUCE_UMIN: return "vecreduce_umin";		case ISD::VECREDUCE_UMIN: return "vecreduce_umin";
case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";		case ISD::VECREDUCE_FMAX: return "vecreduce_fmax";
case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";		case ISD::VECREDUCE_FMIN: return "vecreduce_fmin";

		// Explicit Vector Length erxtension
		// EVL Memory
		case ISD::EVL_LOAD: return "evl_load";
		case ISD::EVL_STORE: return "evl_store";
		case ISD::EVL_GATHER: return "evl_gather";
		case ISD::EVL_SCATTER: return "evl_scatter";

		// EVL Unary operators
		case ISD::EVL_FNEG: return "evl_fneg";

		// EVL Binary operators
		case ISD::EVL_ADD: return "evl_add";
		case ISD::EVL_SUB: return "evl_sub";
		case ISD::EVL_MUL: return "evl_mul";
		case ISD::EVL_SDIV: return "evl_sdiv";
		case ISD::EVL_UDIV: return "evl_udiv";
		case ISD::EVL_SREM: return "evl_srem";
		case ISD::EVL_UREM: return "evl_urem";
		case ISD::EVL_AND: return "evl_and";
		case ISD::EVL_OR: return "evl_or";
		case ISD::EVL_XOR: return "evl_xor";
		case ISD::EVL_SHL: return "evl_shl";
		case ISD::EVL_SRA: return "evl_sra";
		case ISD::EVL_SRL: return "evl_srl";
		case ISD::EVL_FADD: return "evl_fadd";
		case ISD::EVL_FSUB: return "evl_fsub";
		case ISD::EVL_FMUL: return "evl_fmul";
		case ISD::EVL_FDIV: return "evl_fdiv";
		case ISD::EVL_FREM: return "evl_frem";

		// EVL comparison
		case ISD::EVL_SETCC: return "evl_setcc";

		// EVL ternary operators
		case ISD::EVL_FMA: return "evl_fma";

		// EVL shuffle
		case ISD::EVL_VSHIFT: return "evl_vshift";
		case ISD::EVL_COMPRESS: return "evl_compress";
		case ISD::EVL_EXPAND: return "evl_expand";

		case ISD::EVL_COMPOSE: return "evl_compose";
		case ISD::EVL_SELECT: return "evl_select";

		// EVL reduction operators
		case ISD::EVL_REDUCE_FADD: return "evl_reduce_fadd";
		case ISD::EVL_REDUCE_FMUL: return "evl_reduce_fmul";
		case ISD::EVL_REDUCE_ADD: return "evl_reduce_add";
		case ISD::EVL_REDUCE_MUL: return "evl_reduce_mul";
		case ISD::EVL_REDUCE_AND: return "evl_reduce_and";
		case ISD::EVL_REDUCE_OR: return "evl_reduce_or";
		case ISD::EVL_REDUCE_XOR: return "evl_reduce_xor";
		case ISD::EVL_REDUCE_SMAX: return "evl_reduce_smax";
		case ISD::EVL_REDUCE_SMIN: return "evl_reduce_smin";
		case ISD::EVL_REDUCE_UMAX: return "evl_reduce_umax";
		case ISD::EVL_REDUCE_UMIN: return "evl_reduce_umin";
		case ISD::EVL_REDUCE_FMAX: return "evl_reduce_fmax";
		case ISD::EVL_REDUCE_FMIN: return "evl_reduce_fmin";
}		}
}		}

const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {		const char *SDNode::getIndexedModeName(ISD::MemIndexedMode AM) {
switch (AM) {		switch (AM) {
default: return "";		default: return "";
case ISD::PRE_INC: return "<pre-inc>";		case ISD::PRE_INC: return "<pre-inc>";
case ISD::PRE_DEC: return "<pre-dec>";		case ISD::PRE_DEC: return "<pre-dec>";
▲ Show 20 Lines • Show All 481 Lines • Show Last 20 Lines

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

Show First 20 Lines • Show All 747 Lines • ▼ Show 20 Lines	#endif

// Run the DAG combiner in pre-legalize mode.		// Run the DAG combiner in pre-legalize mode.
{		{
NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,		NamedRegionTimer T("combine1", "DAG Combining 1", GroupName,
GroupDescription, TimePassesIsEnabled);		GroupDescription, TimePassesIsEnabled);
CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);		CurDAG->Combine(BeforeLegalizeTypes, AA, OptLevel);
}		}

		if (getenv("SDEBUG")) {
		CurDAG->dump();
		}

#ifndef NDEBUG		#ifndef NDEBUG
if (TTI.hasBranchDivergence())		if (TTI.hasBranchDivergence())
CurDAG->VerifyDAGDiverence();		CurDAG->VerifyDAGDiverence();
#endif		#endif

LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "		LLVM_DEBUG(dbgs() << "Optimized lowered selection DAG: "
<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName		<< printMBBReference(*FuncInfo->MBB) << " '" << BlockName
<< "'\n";		<< "'\n";
▲ Show 20 Lines • Show All 3,074 Lines • Show Last 20 Lines

lib/IR/Attributes.cpp

Show First 20 Lines • Show All 250 Lines • ▼ Show 20 Lines	std::string Attribute::getAsString(bool InAttrGrp) const {
if (hasAttribute(Attribute::ArgMemOnly))		if (hasAttribute(Attribute::ArgMemOnly))
return "argmemonly";		return "argmemonly";
if (hasAttribute(Attribute::Builtin))		if (hasAttribute(Attribute::Builtin))
return "builtin";		return "builtin";
if (hasAttribute(Attribute::ByVal))		if (hasAttribute(Attribute::ByVal))
return "byval";		return "byval";
if (hasAttribute(Attribute::Convergent))		if (hasAttribute(Attribute::Convergent))
return "convergent";		return "convergent";
		if (hasAttribute(Attribute::VectorLength))
		return "vlen";
if (hasAttribute(Attribute::SwiftError))		if (hasAttribute(Attribute::SwiftError))
return "swifterror";		return "swifterror";
if (hasAttribute(Attribute::SwiftSelf))		if (hasAttribute(Attribute::SwiftSelf))
return "swiftself";		return "swiftself";
if (hasAttribute(Attribute::InaccessibleMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOnly))
return "inaccessiblememonly";		return "inaccessiblememonly";
if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))		if (hasAttribute(Attribute::InaccessibleMemOrArgMemOnly))
return "inaccessiblemem_or_argmemonly";		return "inaccessiblemem_or_argmemonly";
if (hasAttribute(Attribute::InAlloca))		if (hasAttribute(Attribute::InAlloca))
return "inalloca";		return "inalloca";
if (hasAttribute(Attribute::InlineHint))		if (hasAttribute(Attribute::InlineHint))
return "inlinehint";		return "inlinehint";
if (hasAttribute(Attribute::InReg))		if (hasAttribute(Attribute::InReg))
return "inreg";		return "inreg";
if (hasAttribute(Attribute::JumpTable))		if (hasAttribute(Attribute::JumpTable))
return "jumptable";		return "jumptable";
		if (hasAttribute(Attribute::Mask))
		return "mask";
		if (hasAttribute(Attribute::Passthru))
		return "passthru";
if (hasAttribute(Attribute::MinSize))		if (hasAttribute(Attribute::MinSize))
return "minsize";		return "minsize";
if (hasAttribute(Attribute::Naked))		if (hasAttribute(Attribute::Naked))
return "naked";		return "naked";
if (hasAttribute(Attribute::Nest))		if (hasAttribute(Attribute::Nest))
return "nest";		return "nest";
if (hasAttribute(Attribute::NoAlias))		if (hasAttribute(Attribute::NoAlias))
return "noalias";		return "noalias";
▲ Show 20 Lines • Show All 1,455 Lines • Show Last 20 Lines

lib/IR/CMakeLists.txt

Show All 17 Lines	add_llvm_library(LLVMCore
DebugInfo.cpp		DebugInfo.cpp
DebugInfoMetadata.cpp		DebugInfoMetadata.cpp
DebugLoc.cpp		DebugLoc.cpp
DiagnosticHandler.cpp		DiagnosticHandler.cpp
DiagnosticInfo.cpp		DiagnosticInfo.cpp
DiagnosticPrinter.cpp		DiagnosticPrinter.cpp
Dominators.cpp		Dominators.cpp
DomTreeUpdater.cpp		DomTreeUpdater.cpp
		EVLBuilder.cpp
Function.cpp		Function.cpp
GVMaterializer.cpp		GVMaterializer.cpp
Globals.cpp		Globals.cpp
IRBuilder.cpp		IRBuilder.cpp
IRPrintingPasses.cpp		IRPrintingPasses.cpp
InlineAsm.cpp		InlineAsm.cpp
Instruction.cpp		Instruction.cpp
Instructions.cpp		Instructions.cpp
Show All 33 Lines

lib/IR/EVLBuilder.cpp

This file was added.

				#include <llvm/IR/EVLBuilder.h>
				#include <llvm/IR/Intrinsics.h>
				#include <llvm/IR/Instructions.h>

				#include <llvm/ADT/SmallVector.h>

				namespace llvm {

				Module &
				EVLBuilder::getModule() const {
				return *Builder.GetInsertBlock()->getParent()->getParent();
				}

				EVLIntrinsicDesc
				EVLBuilder::GetEVLIntrinsicDesc(unsigned OC) {
				switch (OC) {
				// fp unary
				case Instruction::FNeg: return EVLIntrinsicDesc{ Intrinsic::evl_fneg, TypeTokenVec{EVLTypeToken::Vector}, 1, 2}; break;

				// fp binary
				case Instruction::FAdd: return EVLIntrinsicDesc{ Intrinsic::evl_fadd, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::FSub: return EVLIntrinsicDesc{ Intrinsic::evl_fsub, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::FMul: return EVLIntrinsicDesc{ Intrinsic::evl_fmul, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::FDiv: return EVLIntrinsicDesc{ Intrinsic::evl_fdiv, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::FRem: return EVLIntrinsicDesc{ Intrinsic::evl_frem, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;

				// sign-oblivious int
				case Instruction::Add: return EVLIntrinsicDesc{ Intrinsic::evl_add, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::Sub: return EVLIntrinsicDesc{ Intrinsic::evl_sub, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::Mul: return EVLIntrinsicDesc{ Intrinsic::evl_mul, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;

				// signed/unsigned int
				case Instruction::SDiv: return EVLIntrinsicDesc{ Intrinsic::evl_sdiv, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::UDiv: return EVLIntrinsicDesc{ Intrinsic::evl_udiv, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::SRem: return EVLIntrinsicDesc{ Intrinsic::evl_srem, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::URem: return EVLIntrinsicDesc{ Intrinsic::evl_urem, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;

				// logical
				case Instruction::Or: return EVLIntrinsicDesc{ Intrinsic::evl_or, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::And: return EVLIntrinsicDesc{ Intrinsic::evl_and, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::Xor: return EVLIntrinsicDesc{ Intrinsic::evl_xor, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;

				case Instruction::LShr: return EVLIntrinsicDesc{ Intrinsic::evl_lshr, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::AShr: return EVLIntrinsicDesc{ Intrinsic::evl_ashr, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;
				case Instruction::Shl: return EVLIntrinsicDesc{ Intrinsic::evl_shl, TypeTokenVec{EVLTypeToken::Vector}, 2, 3}; break;

				// comparison
				case Instruction::ICmp:
				case Instruction::FCmp:
				return EVLIntrinsicDesc{ Intrinsic::evl_cmp, TypeTokenVec{EVLTypeToken::Mask, EVLTypeToken::Vector}, 2, 3}; break;

				default:
				return EVLIntrinsicDesc{Intrinsic::not_intrinsic, TypeTokenVec(), -1, -1};
				}
				}

				static
				ShortTypeVec
				EncodeTypeTokens(TypeTokenVec TTVec, Type & VectorTy, Type & ScalarTy) {
				ShortTypeVec STV;

				for (auto Token : TTVec) {
				switch (Token) {
				default:
				llvm_unreachable("unsupported token"); // unsupported EVLTypeToken

				case EVLTypeToken::Scalar: STV.push_back(&ScalarTy); break;
				case EVLTypeToken::Vector: STV.push_back(&VectorTy); break;
				case EVLTypeToken::Mask:
				auto NumElems = VectorTy.getVectorNumElements();
				auto MaskTy = VectorType::get(Type::getInt1Ty(VectorTy.getContext()), NumElems);
				STV.push_back(MaskTy); break;

				}
				}

				return STV;
				}

				Value&
				EVLBuilder::GetMaskForType(VectorType & VecTy) {
				if (Mask) return *Mask;

				auto * boolTy = Builder.getInt1Ty();
				auto * maskTy = VectorType::get(boolTy, StaticVectorLength);
				return *ConstantInt::getAllOnesValue(maskTy);
				}

				Value&
				EVLBuilder::GetEVLForType(VectorType & VecTy) {
				if (ExplicitVectorLength) return *ExplicitVectorLength;

				// TODO SVE
				auto * intTy = Builder.getInt32Ty();
				return *ConstantInt::get(intTy, StaticVectorLength);
				}

				Value*
				EVLBuilder::CreateVectorCopy(Instruction & Inst, ValArray VecOpArray) {
				auto oc = Inst.getOpcode();

				auto evlDesc = GetEVLIntrinsicDesc(oc);
				if (evlDesc.ID == Intrinsic::not_intrinsic) {
				return nullptr;
				}

				if ((oc <= Instruction::BinaryOpsEnd) &&
				(oc >= Instruction::BinaryOpsBegin)) {

				assert(VecOpArray.size() == 2);
				Value & FirstOp = *VecOpArray[0];
				Value & SndOp = *VecOpArray[1];

				// Fetch the EVL intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// Materialize the Call
				ShortValueVec Args{&FirstOp, &SndOp, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};

				auto & EVLCall = *Builder.CreateCall(Func, Args);

				// transfer fast math flags
				if (isa<FPMathOperator>(Inst)) {
				cast<CallInst>(EVLCall).copyFastMathFlags(Inst.getFastMathFlags());
				}

				return &EVLCall;
				}

				if ((oc <= Instruction::UnaryOpsBegin) &&
				(oc >= Instruction::UnaryOpsEnd)) {
				assert(VecOpArray.size() == 1);
				Value & FirstOp = *VecOpArray[0];

				// Fetch the EVL intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 1) && (evlDesc.EVLPos == 2));

				// Materialize the Call
				ShortValueVec Args{&FirstOp, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};

				auto & EVLCall = *Builder.CreateCall(Func, Args);

				// transfer fast math flags
				if (isa<FPMathOperator>(Inst)) {
				cast<CallInst>(EVLCall).copyFastMathFlags(Inst.getFastMathFlags());
				}

				return &EVLCall;
				}

				switch (oc) {
				default:
				return nullptr;

				case Instruction::FCmp:
				case Instruction::ICmp: {
				assert(VecOpArray.size() == 2);
				Value & FirstOp = *VecOpArray[0];
				Value & SndOp = *VecOpArray[1];

				// Fetch the EVL intrinsic
				auto & VecTy = cast<VectorType>(*FirstOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();
				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// encode comparison predicate as MD
				uint8_t RawPred = cast<CmpInst>(Inst).getPredicate();
				auto Int8Ty = Builder.getInt8Ty();
				auto PredArg = ConstantInt::get(Int8Ty, RawPred, false);

				// Materialize the Call
				ShortValueVec Args{&FirstOp, &SndOp, &GetMaskForType(VecTy), &GetEVLForType(VecTy), PredArg};

				return Builder.CreateCall(Func, Args);
				}

				case Instruction::Select: {
				assert(VecOpArray.size() == 2);
				Value & MaskOp = *VecOpArray[0];
				Value & OnTrueOp = *VecOpArray[1];
				Value & OnFalseOp = *VecOpArray[2];

				// Fetch the EVL intrinsic
				auto & VecTy = cast<VectorType>(*OnTrueOp.getType());
				auto & ScalarTy = *VecTy.getVectorElementType();

				auto * Func = Intrinsic::getDeclaration(&getModule(), evlDesc.ID, EncodeTypeTokens(evlDesc.typeTokens, VecTy, ScalarTy));

				assert((evlDesc.MaskPos == 2) && (evlDesc.EVLPos == 3));

				// Materialize the Call
				ShortValueVec Args{&OnTrueOp, &OnFalseOp, &MaskOp, &GetEVLForType(VecTy)};

				return Builder.CreateCall(Func, Args);
				}
				}
				}

				VectorType&
				EVLBuilder::getVectorType(Type &ElementTy) {
				return *VectorType::get(&ElementTy, StaticVectorLength);
				}

				Value&
				EVLBuilder::CreateContiguousStore(Value & Val, Value & Pointer) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * StoreFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::evl_store, {Val.getType(), Pointer.getType()});
				ShortValueVec Args{&Val, &Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				return *Builder.CreateCall(StoreFunc, Args);
				}

				Value&
				EVLBuilder::CreateContiguousLoad(Value & Pointer) {
				auto & PointerTy = cast<PointerType>(*Pointer.getType());
				auto & VecTy = getVectorType(*PointerTy.getPointerElementType());

				auto * LoadFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::evl_load, {&VecTy, &PointerTy});
				ShortValueVec Args{&Pointer, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				return *Builder.CreateCall(LoadFunc, Args);
				}

				Value&
				EVLBuilder::CreateScatter(Value & Val, Value & PointerVec) {
				auto & VecTy = cast<VectorType>(*Val.getType());
				auto * ScatterFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::evl_scatter, {Val.getType(), PointerVec.getType()});
				ShortValueVec Args{&Val, &PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				return *Builder.CreateCall(ScatterFunc, Args);
				}

				Value&
				EVLBuilder::CreateGather(Value & PointerVec) {
				auto & PointerVecTy = cast<VectorType>(*PointerVec.getType());
				auto & ElemTy = cast<PointerType>(PointerVecTy.getVectorElementType()).getPointerElementType();
				auto & VecTy = *VectorType::get(&ElemTy, PointerVecTy.getNumElements());
				auto * GatherFunc = Intrinsic::getDeclaration(&getModule(), Intrinsic::evl_gather, {&VecTy, &PointerVecTy});

				ShortValueVec Args{&PointerVec, &GetMaskForType(VecTy), &GetEVLForType(VecTy)};
				return *Builder.CreateCall(GatherFunc, Args);
				}

				} // namespace llvm

lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines	ConstrainedFPIntrinsic::getExceptionBehavior() const {
StringRef ExceptionArg = cast<MDString>(MD)->getString();		StringRef ExceptionArg = cast<MDString>(MD)->getString();
return StringSwitch<ExceptionBehavior>(ExceptionArg)		return StringSwitch<ExceptionBehavior>(ExceptionArg)
.Case("fpexcept.ignore", ebIgnore)		.Case("fpexcept.ignore", ebIgnore)
.Case("fpexcept.maytrap", ebMayTrap)		.Case("fpexcept.maytrap", ebMayTrap)
.Case("fpexcept.strict", ebStrict)		.Case("fpexcept.strict", ebStrict)
.Default(ebInvalid);		.Default(ebInvalid);
}		}

		CmpInst::Predicate
		EVLIntrinsic::getCmpPredicate() const {
		return static_cast<CmpInst::Predicate>(cast<ConstantInt>(getArgOperand(4))->getZExtValue());
		}

		bool EVLIntrinsic::isUnaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::evl_fneg:
		return true;
		}
		}

		Value*
		EVLIntrinsic::GetMask() const {
		if (isBinaryOp()) { return getArgOperand(2); }
		else if (isTernaryOp()) { return getArgOperand(3); }
		else if (isUnaryOp()) { return getArgOperand(1); }
		else return nullptr;
		}

		Value*
		EVLIntrinsic::GetVectorLength() const {
		if (isBinaryOp()) { return getArgOperand(3); }
		else if (isTernaryOp()) { return getArgOperand(4); }
		else if (isUnaryOp()) { return getArgOperand(2); }
		else return nullptr;
		}

		bool EVLIntrinsic::isBinaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;

		case Intrinsic::evl_and:
		case Intrinsic::evl_or:
		case Intrinsic::evl_xor:
		case Intrinsic::evl_ashr:
		case Intrinsic::evl_lshr:
		case Intrinsic::evl_shl:

		case Intrinsic::evl_fadd:
		case Intrinsic::evl_fsub:
		case Intrinsic::evl_fmul:
		case Intrinsic::evl_fdiv:
		case Intrinsic::evl_frem:

		case Intrinsic::evl_reduce_or:
		case Intrinsic::evl_reduce_xor:
		case Intrinsic::evl_reduce_add:
		case Intrinsic::evl_reduce_mul:
		case Intrinsic::evl_reduce_smax:
		case Intrinsic::evl_reduce_smin:
		case Intrinsic::evl_reduce_umax:
		case Intrinsic::evl_reduce_umin:

		case Intrinsic::evl_reduce_fadd:
		case Intrinsic::evl_reduce_fmul:
		case Intrinsic::evl_reduce_fmax:
		case Intrinsic::evl_reduce_fmin:

		case Intrinsic::evl_add:
		case Intrinsic::evl_sub:
		case Intrinsic::evl_mul:
		case Intrinsic::evl_udiv:
		case Intrinsic::evl_sdiv:
		case Intrinsic::evl_urem:
		case Intrinsic::evl_srem:
		return true;
		}
		}

		bool EVLIntrinsic::isTernaryOp() const {
		switch (getIntrinsicID()) {
		default:
		return false;
		case Intrinsic::evl_fma:
		case Intrinsic::evl_select:
		return true;
		}
		}

bool ConstrainedFPIntrinsic::isUnaryOp() const {		bool ConstrainedFPIntrinsic::isUnaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
default:		default:
return false;		return false;
case Intrinsic::experimental_constrained_sqrt:		case Intrinsic::experimental_constrained_sqrt:
case Intrinsic::experimental_constrained_sin:		case Intrinsic::experimental_constrained_sin:
case Intrinsic::experimental_constrained_cos:		case Intrinsic::experimental_constrained_cos:
case Intrinsic::experimental_constrained_exp:		case Intrinsic::experimental_constrained_exp:
Show All 23 Lines

lib/IR/Verifier.cpp

Show First 20 Lines • Show All 1,646 Lines • ▼ Show 20 Lines

// Check parameter attributes against a function type.		// Check parameter attributes against a function type.
// The value V is printed in error messages.		// The value V is printed in error messages.
void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,		void Verifier::verifyFunctionAttrs(FunctionType *FT, AttributeList Attrs,
const Value *V) {		const Value *V) {
if (Attrs.isEmpty())		if (Attrs.isEmpty())
return;		return;

		bool SawMask = false;
bool SawNest = false;		bool SawNest = false;
		bool SawPassthru = false;
bool SawReturned = false;		bool SawReturned = false;
bool SawSRet = false;		bool SawSRet = false;
bool SawSwiftSelf = false;		bool SawSwiftSelf = false;
bool SawSwiftError = false;		bool SawSwiftError = false;
		bool SawVectorLength = false;

// Verify return value attributes.		// Verify return value attributes.
AttributeSet RetAttrs = Attrs.getRetAttributes();		AttributeSet RetAttrs = Attrs.getRetAttributes();
Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&		Assert((!RetAttrs.hasAttribute(Attribute::ByVal) &&
!RetAttrs.hasAttribute(Attribute::Nest) &&		!RetAttrs.hasAttribute(Attribute::Nest) &&
!RetAttrs.hasAttribute(Attribute::StructRet) &&		!RetAttrs.hasAttribute(Attribute::StructRet) &&
!RetAttrs.hasAttribute(Attribute::NoCapture) &&		!RetAttrs.hasAttribute(Attribute::NoCapture) &&
!RetAttrs.hasAttribute(Attribute::Returned) &&		!RetAttrs.hasAttribute(Attribute::Returned) &&
▲ Show 20 Lines • Show All 46 Lines • ▼ Show 20 Lines	for (unsigned i = 0, e = FT->getNumParams(); i != e; ++i) {
}		}

if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {		if (ArgAttrs.hasAttribute(Attribute::SwiftError)) {
Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",		Assert(!SawSwiftError, "Cannot have multiple 'swifterror' parameters!",
V);		V);
SawSwiftError = true;		SawSwiftError = true;
}		}

		if (ArgAttrs.hasAttribute(Attribute::VectorLength)) {
		Assert(!SawVectorLength, "Cannot have multiple 'vlen' parameters!",
		V);
		SawVectorLength = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Passthru)) {
		Assert(!SawPassthru, "Cannot have multiple 'passthru' parameters!",
		V);
		SawPassthru = true;
		}

		if (ArgAttrs.hasAttribute(Attribute::Mask)) {
		Assert(!SawMask, "Cannot have multiple 'mask' parameters!",
		V);
		SawMask = true;
		}

if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {		if (ArgAttrs.hasAttribute(Attribute::InAlloca)) {
Assert(i == FT->getNumParams() - 1,		Assert(i == FT->getNumParams() - 1,
"inalloca isn't on the last parameter!", V);		"inalloca isn't on the last parameter!", V);
}		}
}		}

		Assert(!SawPassthru \|\| SawMask,
		"Cannot have 'passthru' parameter without 'mask' parameter!", V);

if (!Attrs.hasAttributes(AttributeList::FunctionIndex))		if (!Attrs.hasAttributes(AttributeList::FunctionIndex))
return;		return;

verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);		verifyAttributeTypes(Attrs.getFnAttributes(), /IsFunction=/true, V);

Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&		Assert(!(Attrs.hasFnAttribute(Attribute::ReadNone) &&
Attrs.hasFnAttribute(Attribute::ReadOnly)),		Attrs.hasFnAttribute(Attribute::ReadOnly)),
"Attributes 'readnone and readonly' are incompatible!", V);		"Attributes 'readnone and readonly' are incompatible!", V);
▲ Show 20 Lines • Show All 1,300 Lines • ▼ Show 20 Lines	Assert(
&II);		&II);

visitTerminator(II);		visitTerminator(II);
}		}

/// visitUnaryOperator - Check the argument to the unary operator.		/// visitUnaryOperator - Check the argument to the unary operator.
///		///
void Verifier::visitUnaryOperator(UnaryOperator &U) {		void Verifier::visitUnaryOperator(UnaryOperator &U) {
Assert(U.getType() == U.getOperand(0)->getType(),		Assert(U.getType() == U.getOperand(0)->getType(),
"Unary operators must have same type for"		"Unary operators must have same type for"
"operands and result!",		"operands and result!",
&U);		&U);

switch (U.getOpcode()) {		switch (U.getOpcode()) {
// Check that floating-point arithmetic operators are only used with		// Check that floating-point arithmetic operators are only used with
// floating-point operands.		// floating-point operands.
case Instruction::FNeg:		case Instruction::FNeg:
▲ Show 20 Lines • Show All 1,812 Lines • ▼ Show 20 Lines	struct VerifierLegacyPass : public FunctionPass {
bool doInitialization(Module &M) override {		bool doInitialization(Module &M) override {
V = llvm::make_unique<Verifier>(		V = llvm::make_unique<Verifier>(
&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);		&dbgs(), /ShouldTreatBrokenDebugInfoAsError=/false, M);
return false;		return false;
}		}

bool runOnFunction(Function &F) override {		bool runOnFunction(Function &F) override {
if (!V->verify(F) && FatalErrors) {		if (!V->verify(F) && FatalErrors) {
errs() << "in function " << F.getName() << '\n';		errs() << "in function " << F.getName() << '\n';
report_fatal_error("Broken function found, compilation aborted!");		report_fatal_error("Broken function found, compilation aborted!");
}		}
return false;		return false;
}		}

bool doFinalization(Module &M) override {		bool doFinalization(Module &M) override {
bool HasErrors = false;		bool HasErrors = false;
for (Function &F : M)		for (Function &F : M)
▲ Show 20 Lines • Show All 397 Lines • Show Last 20 Lines

lib/Transforms/Utils/CodeExtractor.cpp

Show First 20 Lines • Show All 767 Lines • ▼ Show 20 Lines	if (Attr.isStringAttribute()) {
case Attribute::Convergent:		case Attribute::Convergent:
case Attribute::Dereferenceable:		case Attribute::Dereferenceable:
case Attribute::DereferenceableOrNull:		case Attribute::DereferenceableOrNull:
case Attribute::InAlloca:		case Attribute::InAlloca:
case Attribute::InReg:		case Attribute::InReg:
case Attribute::InaccessibleMemOnly:		case Attribute::InaccessibleMemOnly:
case Attribute::InaccessibleMemOrArgMemOnly:		case Attribute::InaccessibleMemOrArgMemOnly:
case Attribute::JumpTable:		case Attribute::JumpTable:
		case Attribute::Mask:
case Attribute::Naked:		case Attribute::Naked:
case Attribute::Nest:		case Attribute::Nest:
case Attribute::NoAlias:		case Attribute::NoAlias:
case Attribute::NoBuiltin:		case Attribute::NoBuiltin:
case Attribute::NoCapture:		case Attribute::NoCapture:
case Attribute::NoReturn:		case Attribute::NoReturn:
case Attribute::None:		case Attribute::None:
case Attribute::NonNull:		case Attribute::NonNull:
		case Attribute::Passthru:
case Attribute::ReadNone:		case Attribute::ReadNone:
case Attribute::ReadOnly:		case Attribute::ReadOnly:
case Attribute::Returned:		case Attribute::Returned:
case Attribute::ReturnsTwice:		case Attribute::ReturnsTwice:
case Attribute::SExt:		case Attribute::SExt:
case Attribute::Speculatable:		case Attribute::Speculatable:
case Attribute::StackAlignment:		case Attribute::StackAlignment:
case Attribute::StructRet:		case Attribute::StructRet:
case Attribute::SwiftError:		case Attribute::SwiftError:
case Attribute::SwiftSelf:		case Attribute::SwiftSelf:
		case Attribute::VectorLength:
case Attribute::WriteOnly:		case Attribute::WriteOnly:
case Attribute::ZExt:		case Attribute::ZExt:
case Attribute::EndAttrKinds:		case Attribute::EndAttrKinds:
continue;		continue;
// Those attributes should be safe to propagate to the extracted function.		// Those attributes should be safe to propagate to the extracted function.
case Attribute::AlwaysInline:		case Attribute::AlwaysInline:
case Attribute::Cold:		case Attribute::Cold:
case Attribute::NoRecurse:		case Attribute::NoRecurse:
▲ Show 20 Lines • Show All 670 Lines • Show Last 20 Lines

test/Bitcode/attributes.ll

	Show First 20 Lines • Show All 345 Lines • ▼ Show 20 Lines
	}			}

	; CHECK: define void @f59() #35			; CHECK: define void @f59() #35
	define void @f59() shadowcallstack			define void @f59() shadowcallstack
	{			{
	ret void			ret void
	}			}

				; CHECK: define <8 x double> @f60(<8 x double> passthru, <8 x i1> mask, i32 vlen) {
				define <8 x double> @f60(<8 x double> passthru, <8 x i1> mask, i32 vlen) {
				ret <8 x double> undef
				}

	; CHECK: attributes #0 = { noreturn }			; CHECK: attributes #0 = { noreturn }
	; CHECK: attributes #1 = { nounwind }			; CHECK: attributes #1 = { nounwind }
	; CHECK: attributes #2 = { readnone }			; CHECK: attributes #2 = { readnone }
	; CHECK: attributes #3 = { readonly }			; CHECK: attributes #3 = { readonly }
	; CHECK: attributes #4 = { noinline }			; CHECK: attributes #4 = { noinline }
	; CHECK: attributes #5 = { alwaysinline }			; CHECK: attributes #5 = { alwaysinline }
	; CHECK: attributes #6 = { optsize }			; CHECK: attributes #6 = { optsize }
	; CHECK: attributes #7 = { ssp }			; CHECK: attributes #7 = { ssp }
	Show All 29 Lines

test/Verifier/evl_attribs.ll

This file was added.

				; RUN: not llvm-as %s -o /dev/null 2>&1 \| FileCheck %s

				declare void @a(<16 x i1> mask %a, <16 x i1> mask %b)
				; CHECK: Cannot have multiple 'mask' parameters!

				declare void @b(<16 x i1> mask %a, i32 vlen %x, i32 vlen %y)
				; CHECK: Cannot have multiple 'vlen' parameters!

				declare <16 x double> @c(<16 x double> passthru %a)
				; CHECK: Cannot have 'passthru' parameter without 'mask' parameter!

				declare <16 x double> @d(<16 x double> passthru %a, <16 x i1> mask %M, <16 x double> passthru %b)
				; CHECK: Cannot have multiple 'passthru' parameters!

utils/TableGen/CodeGenIntrinsics.h

Show First 20 Lines • Show All 130 Lines • ▼ Show 20 Lines	struct CodeGenIntrinsic {

/// True if the intrinsic has side effects that aren't captured by any		/// True if the intrinsic has side effects that aren't captured by any
/// of the other flags.		/// of the other flags.
bool hasSideEffects;		bool hasSideEffects;

// True if the intrinsic is marked as speculatable.		// True if the intrinsic is marked as speculatable.
bool isSpeculatable;		bool isSpeculatable;

enum ArgAttribute { NoCapture, Returned, ReadOnly, WriteOnly, ReadNone };		enum ArgAttribute { Mask, NoCapture, Passthru, Returned, ReadOnly, WriteOnly, ReadNone, VectorLength };
std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;		std::vector<std::pair<unsigned, ArgAttribute>> ArgumentAttributes;

bool hasProperty(enum SDNP Prop) const {		bool hasProperty(enum SDNP Prop) const {
return Properties & (1 << Prop);		return Properties & (1 << Prop);
}		}

CodeGenIntrinsic(Record *R);		CodeGenIntrinsic(Record *R);
};		};
Show All 25 Lines

utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 593 Lines • ▼ Show 20 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|		!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else {		} else {
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));
}
if (MVT(VT).isOverloaded()) {		if (MVT(VT).isOverloaded()) {
OverloadedVTs.push_back(VT);		OverloadedVTs.push_back(VT);
isOverloaded = true;		isOverloaded = true;
}		}
		}

// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid)		if (VT == MVT::isVoid)
PrintFatalError("Intrinsic '" + DefName + " has void in result type list!");		PrintFatalError("Intrinsic '" + DefName + " has void in result type list!");

IS.RetVTs.push_back(VT);		IS.RetVTs.push_back(VT);
IS.RetTypeDefs.push_back(TyEl);		IS.RetTypeDefs.push_back(TyEl);
}		}
Show All 16 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
// It only makes sense to use the extended and truncated vector element		// It only makes sense to use the extended and truncated vector element
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType") &&		!TyEl->isSubClassOf("LLVMTruncatedType") &&
!TyEl->isSubClassOf("LLVMVectorSameWidth")) \|\|		!TyEl->isSubClassOf("LLVMVectorSameWidth")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else		} else {
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));

if (MVT(VT).isOverloaded()) {		if (MVT(VT).isOverloaded()) {
OverloadedVTs.push_back(VT);		OverloadedVTs.push_back(VT);
isOverloaded = true;		isOverloaded = true;
}		}
		}


// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)		if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)
PrintFatalError("Intrinsic '" + DefName + " has void in result type list!");		PrintFatalError("Intrinsic '" + DefName + " has void in result type list!");

IS.ParamVTs.push_back(VT);		IS.ParamVTs.push_back(VT);
IS.ParamTypeDefs.push_back(TyEl);		IS.ParamTypeDefs.push_back(TyEl);
}		}
Show All 35 Lines	for (unsigned i = 0, e = PropList->size(); i != e; ++i) {
else if (Property->getName() == "IntrHasSideEffects")		else if (Property->getName() == "IntrHasSideEffects")
hasSideEffects = true;		hasSideEffects = true;
else if (Property->isSubClassOf("NoCapture")) {		else if (Property->isSubClassOf("NoCapture")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));		ArgumentAttributes.push_back(std::make_pair(ArgNo, NoCapture));
} else if (Property->isSubClassOf("Returned")) {		} else if (Property->isSubClassOf("Returned")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));		ArgumentAttributes.push_back(std::make_pair(ArgNo, Returned));
		} else if (Property->isSubClassOf("VectorLength")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, VectorLength));
		} else if (Property->isSubClassOf("Mask")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Mask));
		} else if (Property->isSubClassOf("Passthru")) {
		unsigned ArgNo = Property->getValueAsInt("ArgNo");
		ArgumentAttributes.push_back(std::make_pair(ArgNo, Passthru));
} else if (Property->isSubClassOf("ReadOnly")) {		} else if (Property->isSubClassOf("ReadOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, ReadOnly));
} else if (Property->isSubClassOf("WriteOnly")) {		} else if (Property->isSubClassOf("WriteOnly")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));		ArgumentAttributes.push_back(std::make_pair(ArgNo, WriteOnly));
} else if (Property->isSubClassOf("ReadNone")) {		} else if (Property->isSubClassOf("ReadNone")) {
unsigned ArgNo = Property->getValueAsInt("ArgNo");		unsigned ArgNo = Property->getValueAsInt("ArgNo");
Show All 11 Lines

utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 588 Lines • ▼ Show 20 Lines	if (ae) {
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::Returned:		case CodeGenIntrinsic::Returned:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::Returned";		OS << "Attribute::Returned";
addComma = true;		addComma = true;
break;		break;
		case CodeGenIntrinsic::VectorLength:
		if (addComma)
		OS << ",";
		OS << "Attribute::VectorLength";
		addComma = true;
		break;
		case CodeGenIntrinsic::Mask:
		if (addComma)
		OS << ",";
		OS << "Attribute::Mask";
		addComma = true;
		break;
		case CodeGenIntrinsic::Passthru:
		if (addComma)
		OS << ",";
		OS << "Attribute::Passthru";
		addComma = true;
		break;
case CodeGenIntrinsic::ReadOnly:		case CodeGenIntrinsic::ReadOnly:
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::ReadOnly";		OS << "Attribute::ReadOnly";
addComma = true;		addComma = true;
break;		break;
case CodeGenIntrinsic::WriteOnly:		case CodeGenIntrinsic::WriteOnly:
if (addComma)		if (addComma)
▲ Show 20 Lines • Show All 258 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

RFC: Explicit Vector Length Intrinsics and AttributesNeeds ReviewPublic

Details

Rationale

Proposed changes

Intrinsics

Attributes

An example

Lowering

LLVM-SVE with predication and dynamic vector length (RISC-V V extension, NEC SX-Aurora)

Lowering for targets w/o dynamic vector length (AVX512, ARM SVE, ..)

Lowering for fixed-width SIMD w/o predication (SSE, NEON, AdvSimd, ..)

Example 1: safe fdiv

Pros & Cons

Pros

Cons

Diff Detail

Event Timeline

Example

Vector register coalescing

Vector register coalescing

Excess lane semantics

Changes

Changes

Changes

Revision Contents

Diff 183102

include/llvm/Bitcode/LLVMBitCodes.h

include/llvm/CodeGen/ISDOpcodes.h

include/llvm/CodeGen/SelectionDAG.h

include/llvm/CodeGen/SelectionDAGNodes.h

include/llvm/IR/Attributes.td

include/llvm/IR/EVLBuilder.h

include/llvm/IR/IntrinsicInst.h

include/llvm/IR/Intrinsics.td

include/llvm/Target/TargetSelectionDAG.td

lib/AsmParser/LLLexer.cpp

lib/AsmParser/LLParser.cpp

lib/AsmParser/LLToken.h

lib/Bitcode/Reader/BitcodeReader.cpp

lib/Bitcode/Writer/BitcodeWriter.cpp

lib/CodeGen/SelectionDAG/DAGCombiner.cpp

lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp

lib/CodeGen/SelectionDAG/LegalizeTypes.h

lib/CodeGen/SelectionDAG/SelectionDAG.cpp

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.h

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp

lib/CodeGen/SelectionDAG/SelectionDAGDumper.cpp

lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp

lib/IR/Attributes.cpp

lib/IR/CMakeLists.txt

lib/IR/EVLBuilder.cpp

lib/IR/IntrinsicInst.cpp

lib/IR/Verifier.cpp

lib/Transforms/Utils/CodeExtractor.cpp

test/Bitcode/attributes.ll

test/Verifier/evl_attribs.ll

utils/TableGen/CodeGenIntrinsics.h

utils/TableGen/CodeGenTarget.cpp

utils/TableGen/IntrinsicEmitter.cpp

RFC: Explicit Vector Length Intrinsics and Attributes
Needs ReviewPublic