This is an archive of the discontinued LLVM Phabricator instance.

llvm/include/llvm/IR/Intrinsics.td
1230	I asked @sstefan1 to add nosync and nofree to the td file and other places needed to use them. That should make your changes more concise. I would have argued we could even do the `#inline` trick here, or maybe some .td magic, to avoid this replication, but I will not force anything.
llvm/include/llvm/IR/VPIntrinsics.def
2	Nit: copy&past
28	To be honest, I would have made the Opcode for vp intrinsics of existing functions `VP_EXISTING` (or similar) and used a single macro definition here, assuming all vp intrinsics so far have a unique corresponding scalar version. These are two unrelated points you can consider (I won't force either). (If you device to do both changes, you can even get away by only defining the existing opcode in the intrinsic "macro definition").
90	Nit: I'd run clang format on these files as well. Should minimize diffs in the long run.
llvm/lib/IR/IntrinsicInst.cpp
199	Does it make sense to have this corner case in here or should we just make it an `Optional<int64_t>` and call it a day? You can also remove the lambda if you move the code, except there is a reason to keep it.
218	Default clauses make it harder to find all switches that might need to be adjusted. If there is no strong reason to have them we should not.
248	Leftover?
261	I doubt this is a sensible default. I doubt one should return anything for a non-vp intrinsic.
292	Nit: braces.
308	Nit: clang format your changes (or patches) (clang-format.py for vim has the formatdiff option)
llvm/unittests/IR/VPIntrinsicTest.cpp
21	Not: `using namsepace llvm;` is probably less confusing here

jdoerfert added inline comments.Nov 12 2019, 10:48 PM

llvm/lib/IR/IntrinsicInst.cpp
218	Forget this one.

Thanks for the comments so far :)

llvm/include/llvm/IR/Intrinsics.td
1230	I would have argued we could even do the `#inline` trick here, or maybe some .td magic, to avoid this replication, but I will not force anything. I agree but refactoring `Intrinsics.td` is out of scope for this patch.
llvm/include/llvm/IR/VPIntrinsics.def
28	assuming all vp intrinsics so far have a unique corresponding scalar version. They don't. That's why it's separate. There will be more cases for constrained fp, reduction and memory ops.
llvm/lib/IR/IntrinsicInst.cpp
199	Actually, the excessive vector length case should `return None` to imply that the static vector length could not be inferred. Btw, later patches will reuse the lambda.
261	If there is no functional IR opcode this is still a call to an intrinsic function. It's natural to return `Call` in this case.
308	This particular function is clang-formatted already.. The existing code in `IntrinsicInst.cpp` is not. I'll do a patch clang-formatting the entire file after this one.

Changes

Cleanup, addressed nitpicks
Assume vlen can not be ignored for the corner case of an excessive static vector length (safe default. Pass -1 to enable all lanes in those cases, should they ever occur..).
made integer intrinsics nosync. Will consider an explicit nofree for VP intrinsics with memory args if the intrinsic isn't readnone already.

Harbormaster completed remote builds in B40873: Diff 229029.Nov 13 2019, 1:55 AM

samparker added inline comments.Nov 13 2019, 2:08 AM

llvm/lib/IR/IntrinsicInst.cpp
216	Ah, sorry, I had missed this was for constant lengths!

sstefan1 added inline comments.Nov 13 2019, 2:24 AM

llvm/include/llvm/IR/Intrinsics.td
1230	I kind of started some of this in D65377 and the approach will try and propose is an opt-out list. If I understood correctly I should try to do that part first (and propose). @jdoerfert correct me if I'm wrong. Hopefully I'll start this today.

jdoerfert added inline comments.Nov 13 2019, 8:28 AM

llvm/include/llvm/IR/Intrinsics.td
1230	@simoll Agreed. This is out of scope. @sstefan1 I think your plan sounds good.
llvm/include/llvm/IR/VPIntrinsics.def
2	now too long and wrapped :( No need for `llvm/`, which is confusing as it lives in `llvm/IR`. Also "Instructions" might be a bit to broad? Idk, maybe: "vector intrinsic descriptions"
28	I see. Though, even if later intriniscs do not correspond to scalar instructions, the changes would still be possible. I admit, at this point it is a question if you like: REGISTER_VP_INTRINSIC(vp_add, 2, 3) HANDLE_VP_TO_OC(vp_add, Add) REGISTER_VP_INTRINSIC(vp_red, 2, 3) better or worse than REGISTER_VP_FROM_SCALAR_INTRINSIC(Add, 2, 3) REGISTER_VP_INTRINSIC(Red, 2, 3) with the appropriate definition of the above: #define REGISTER_VP_FROM_SCALAR_INTRINSIC(OC, MP, VP) \ REGISTER_VP_INTRINSIC(OC, MP, VP) \ HANDLE_VP_TO_OC(vp_##OC, OC)
llvm/lib/IR/IntrinsicInst.cpp
199	Fine with me.
261	not really convinced but OK.

simoll planned changes to this revision.Nov 13 2019, 9:01 AM

simoll marked 14 inline comments as done.

simoll added inline comments.

llvm/include/llvm/IR/VPIntrinsics.def
2	//===-- IR/VPInstruction.def - Describes llvm.vp.* Intrinsics -- C++ --===//
28	I prefer the explicit form REGISTER_VP_INTRINSIC(vp_add, 2, 3) HANDLE_VP_TO_OC(vp_add, Add) The explicit form gives you much nicer diffs. You don't have to read a meta macro to understand what is going on. This file will undergo changes with the next patches. I wouldn't spend too much time compacting it before hasn't "settled down".

Finally fixed the IR/VPIntrinsics.def header line.

Harbormaster completed remote builds in B40894: Diff 229116.Nov 13 2019, 9:06 AM

VPInstruction.def -> VPIntrinsics.def -.-

Harbormaster completed remote builds in B40895: Diff 229118.Nov 13 2019, 9:07 AM

I think I'm fine with this. Anyone else?

llvm/include/llvm/IR/VPIntrinsics.def
28	OK.

+1 from me! Thanks for doing this!

llvm/docs/LangRef.rst
15287	Nit: instead of talking about the MSB being zero, is it simpler to say that it is effective if is a positive number (if that's what you mean)?
15357	Nit: please ignore if you disagree, but perhaps but perhaps shorter/simpler is: The first two operands and the result have the same vector of integer type. -> The operands and the result are integer vector types. (this applies to all/most descriptions)
llvm/lib/IR/IntrinsicInst.cpp
182	Nit: don't think this lamba adds anything, it's just called once.

In D69891#1745257, @SjoerdMeijer wrote:

+1 from me! Thanks for doing this!

Thanks for reviewing! I'll upload a final update for the LangRef.

llvm/docs/LangRef.rst
15287	Good point. I'll rewrite that part referring to the vlen as a signed integer.
15357	I'd like to shorten that. However, there are always mask and vlen operands so we cannot say that all operands have an integer vector type.
llvm/lib/IR/IntrinsicInst.cpp
182	It'll be called twice in the final version (see reference patch).

LangRef rewording: explicit vlen is a signed integer.

Harbormaster completed remote builds in B40943: Diff 229249.Nov 14 2019, 2:13 AM

Thanks for your efforts to move this forward @simoll!

llvm/docs/LangRef.rst
15283	I expect this to also work for scalable vectors, so maybe add a case here for `<vscale x W x T>` as well?
15285	nit: bit -> i1
15287	Thanks, I wondered something similar. I had actually expected the `%evl` to only make sense in the context of scalable vectors? Is it worth expressing what it means if `%evl` is larger than the size of the vector?
15289	nit: 80 char limit (also for other places in this file)
15321	Have you considered adding an extra argument to specify the value of the false lanes, similar to how this is currently done for `llvm.masked.load`? By passing `undef`, it would have similar behaviour as the current definition, yet it would remove the need to add explicit `select` statements for handling merging- or zeroing predication if an instruction supports it. For SVE for example, most instructions have either merging or zeroing predication and the intrinsics expose the merging/zeroing/undef predication directly in the C/C++ intrinsic API. I think it would be really nice to have that capability represented in the vp intrinsics as well.

Adding scalable type example declarations
reformatting langref for character limit.

llvm/docs/LangRef.rst
15287	I had actually expected the %evl to only make sense in the context of scalable vectors? EVL does not require scalable vectors. We'll use it with the `<256 x double>` type for example.
15321	Yes i considered that. However, as you suggested you can express this without loss with a `select` and i'd like to keep the intrinsics simple.

Thanks for your feedback!

Formatted LangRef for char limit

Harbormaster completed remote builds in B40956: Diff 229306.Nov 14 2019, 7:12 AM

jdoerfert added inline comments.Nov 14 2019, 8:48 AM

llvm/docs/LangRef.rst
15283	If that support is added later, I would add the wording later too.
15321	FWIW, I like the idea of simple intrinsics and explicit selection.

sdesmalen added inline comments.Nov 14 2019, 9:38 AM

llvm/docs/LangRef.rst
15283	@simoll is anything special needed to support scalable vectors in this context?
15321	If we want to keep the vp intrinsics simple, shouldn't we reconsider the need for `%evl` as an explicit parameter? That value can be folded into the predicate using a special intrinsic that enables the lanes upto `%evl`, e.g. `<256 x i1> llvm.vp.enable.lanes(<256 x i1> %p, i32 %evl)` If any operation needs `%evl` as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects.
llvm/include/llvm/IR/IntrinsicInst.h
233	Can this be renamed to `getVectorLength()` and have it return `ElementCount` instead? (at which point we can drop the Optional)

ElementCount getStaticVectorLength()
unit test for canIgnoreVectorLength on scalable examples.

Harbormaster completed remote builds in B41011: Diff 229490.Nov 15 2019, 2:59 AM

simoll added inline comments.Nov 15 2019, 3:00 AM

llvm/docs/LangRef.rst
15283	@simoll is anything special needed to support scalable vectors in this context? The intrinsics support scalable vectors out of the box. I've added VP intrinsics with scalable types to the individual operations below.
15321	If any operation needs %evl as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects. The SX-Aurora and RiSC-V V extension natively support an explicit vector length parameter just as they support predicated vector. Mask and EVL should be treated the same: if there is a vector mask parameter there should also be an EVL parameter.
llvm/include/llvm/IR/IntrinsicInst.h
233	Sure. That actually makes it simpler.

sdesmalen added inline comments.Nov 15 2019, 5:49 AM

llvm/docs/LangRef.rst
15321	I understand that certain instructions take an explicit vector length parameter, but that doesn't really explain why an arch-independent intrinsic needs to expose this. Is there any reason that modelling this using something like `llvm.vp.enable.lanes` won't work?
llvm/include/llvm/IR/IntrinsicInst.h
233	Awesome, thanks!

simoll marked an inline comment as done.Nov 15 2019, 7:07 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15321	That's already two architectures that do support EVL on all vector instructions and it is vital to model it. EVL has performance implications (smaller EVL -> lower latency; no need to compute and store a mask where EVL can be used) . For SVE/MVE, you can simply turn the feature off by passing `i32 -1`. From the perspective of RVV and SX-Aurora, saying we do not need EVL is equivalent to saying we do not need vector predication.

cameron.mcinally added a subscriber: cameron.mcinally.Nov 18 2019, 8:59 AM

cameron.mcinally added inline comments.

llvm/docs/LangRef.rst
15321	Yes i considered that. However, as you suggested you can express this without loss with a select and i'd like to keep the intrinsics simple. Having an explicit `passthru` is a good idea. Keeping the select glued to the intrinsics may be difficult. For example: `select %mask, %r, undef` It would be easy for a select peep to replace this with %r, since the masked bits are undefined. But for a situation like a trapping instruction, we wouldn't want the the masked elements to execute.

simoll marked an inline comment as done.Nov 18 2019, 9:27 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15321	select %mask, %r, undef This is the current behavior - the values on masked-off lanes (by EVL or the mask) are `undef`. My problem with passthru is that it encourages people to specify arbitrary values on masked-off lanes and to do so often. Yes, for some combination of target architecture, instruction, passthru value and type it might be a perfect match for one machine instruction . But, for most combinations you will have to emit an extra `select` anyway. Besides, if the select is explicit and can be folded into another operation to simplify it, then that is actually a good thing. But there is more, for things like coalescing predicated registers and vector register spilling, it is helpful to have undef-on-masked-off and a separate select instruction that can be re-scheduled.

sdesmalen added inline comments.Nov 18 2019, 10:26 AM

llvm/docs/LangRef.rst
15321	If we'd use a `select` instruction to implement the zeroing/merging predication, the issue with the current set of intrinsics is that we don't know how many lanes to select because the predicate mask is no longer the determining factor to limit which lanes are the valid lanes. Suggesting that other architectures can just turn off the feature by setting the `%evl` parameter to `-1` doesn't change the fact that the code generator still needs to handle the IR if `%evl` is not `-1`. It would be valid IR and there is nothing that guarantees LLVM won't generate the generic intrinsics with a value that is not `-1`. This is not really special to `select`, but common to other operations that take a mask as well. For our case it means we can't implement zeroing/merging predication with this set of intrinsics using `select`. If we want to solve the `select` issue and also keep the intrinsics simple, my suggestion was to combine the explicit vector length with the mask using an explicit intrinsic like `@llvm.vp.enable.lanes`. Because this is an explicit intrinsic, the code-generator can simply extract the `%evl` parameter and pass that directly to the instructions for RVV/SXA. This is what happens for many other intrinsics in LLVM already, like `masked.load`/`masked.gather` that support only a single addressing mode, where it is up to the code-generator to pick apart the value into operands that are suited for a more optimal load instruction. Without having heard your thoughts on this suggestion, I would have to guess that your reservation is the possibility of LLVM hoisting/separating the logic that merges predicate mask and `%evl` value in some way. That would mean having to do some tricks (think CodeGenPrep) to keep the values together and recognizable for CodeGen. And that's the exact same thing we would like to avoid for supporting merging/zeroing predication, hence the suggestion for the explicit `passthru` parameter.

sdesmalen added inline comments.Nov 18 2019, 10:26 AM

llvm/docs/LangRef.rst
15321	My problem with passthru is that it encourages people to specify arbitrary values on masked-off lanes and to do so often. Yes, for some combination of target architecture, instruction, passthru value and type it might be a perfect match for one machine instruction . But, for most combinations you will have to emit an extra select anyway. The most common passthru values in practice will be one of the source operands or zero/undef though. These are supported by SVE/SVE2 as a first-class feature (and I could imagine other/future vector architectures having special support for that as well). Other values can indeed be lowered to an explicit select instruction. But there is more, for things like coalescing predicated registers and vector register spilling, it is helpful to have undef-on-masked-off and a separate select instruction that can be re-scheduled. At that point the intrinsic will already be lowered to individual instructions, so I don't think the definition of the intrinsic has much of an impact on that.

cameron.mcinally added inline comments.Nov 18 2019, 10:38 AM

llvm/docs/LangRef.rst
15321	Yes, you're right. Having an implicit undef and explicit merge is good.

simoll marked an inline comment as done.Nov 18 2019, 10:54 AM

simoll added inline comments.

llvm/docs/LangRef.rst
15321	Suggesting that other architectures can just turn off the feature by setting the %evl parameter to -1 doesn't change the fact that the code generator still needs to handle the IR if %evl is not -1 Other targets do not need to consider `%evl` because there will be an `ExpandVectorPredication` pass that folds the EVL into the vector mask. Backends can request through TTI that the `%evl` be folded away and ignore that parameter from there on. An early version of that pass is included in the VP reference patch. The TTI changes and the expansion pass are planned for the next patch after this one.

ping?

SjoerdMeijer added inline comments.Nov 22 2019, 7:54 AM

llvm/docs/LangRef.rst
15321	Don't think this discussion reached consensus, and I missed the nuances here earlier / changed my mind on it. The prior art on this also uses an explicit `passthru`, and it looks more generic than patching that up with an `select` later; you can always pass an `undef` in? I also read the description of %evl on line 14664 above again, and thought the definition could be tightened a bit, i.e. specify an upperbound, also just for clarity: The explicit vector length parameter always has the type `i32` and is a signed integer value. The explicit vector length is only effective if it is non-negative. Results are only computed for enabled lanes. A lane is enabled if the mask at that position is true and, if effective, where the lane position is below the explicit vector length. And then started drafting something to see if I could describe the proposed behaviour: The explicit vector length parameter always has the type `i32` and is a signed integer value. The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, the effective vector length is used to calculate a mask, and its value is: 0 <= evl <= W, where W is the vector length. This creates a mask, `%EVLmask`, with all elements `0 <= i <= evl` set to `True`, and all other lanes `evl < i <= W` to `False`. A new mask `%M` is calculated with an element-wise AND from `%mask` and `%EVLmask`: M = Vmask AND EVLmask A vector operation `<opcode>` on vectors `A` and `B` calculates: A <opcode> B = { A[i] <opcode> B[i] if M[i] = T, and { undef otherwise and if I'm not mistaken we are now discussing if `undef` here should be `undef` or a `passthru`
15334	nit: perhaps more consistent `%avl` -> `%evl` ?

rkruppe added inline comments.Nov 24 2019, 7:34 AM

llvm/docs/LangRef.rst
15321	Sjoerd's summary matches my understanding of the question here. I believe so far nobody spelled out that there's actually a material difference between a `passthru` argument and a separate `select`. The mask that is readily available for the `select` (the `%mask` passed to the VP instruction) isn't the same as the "enabled lanes" computed from mask and EVL (`%M` in Sjoerd's exposition above), which is extremely undesirable to materialize on machines with a native equipvalent of EVL (such as SX-Aurora and RISC-V). In particular, if `%mask` is computed by another VP instruction, then lanes of `%mask` past the relevant EVL are undef, not false. That means a select like `select %mask, %v, zeroinitializer` does not actually make the disabled lanes zero, past EVL they are again undef (assuming `%x` and `%mask` were computed by VP instructions with the same EVL). In contrast, a passthru value would (presumably, I didn't see anyone object) apply to all disabled lanes. I don't know how much this matters, since usually consumers of the vector will also only work up to the same EVL. But there are exceptions, such as the code pattern for vectorizing (associative or `-ffast-math`) reductions for RISC-V V. Briefly put, that pattern is to accumulate a vector of partial (e.g.) sums in the loop, preserving lanes past the current EVL (which may be smaller than the EVL in some previous iterations) so they'll still participate in the final reduction after the loop. This can be expressed perfectly with `passthru`. Without `passthru`, that kind of code seems to require juggling two EVLs (the actual current EVL for this iteration, and the maximum EVL across all iterations) as well as materializing what Sjoerd called `%EVLmask` and `%M` above. This seems much harder for a backend to pattern-match and turn into acceptable code than the `select %m, vp.something(%a, %b, %m, -1), zeroinitializer` pattern. I previously felt that `passthru` would be nice to have for backend maintainers (including myself) but perhaps not worth the duplication of IR functionality (having two ways to do selects). However, given the differences I just described, I don't think "just use `select`" is workable. That doesn't necessarily mean we need `passthru` arguments on every VP instruction, I could also imagine adding a `vp.passthru(%v, %passthru, %mask, %evl)` intrinsic that takes from the first operand for all enabled lanes and the second for the other lanes. But I feel like once the floodgates are opened for doing merging "differently", it's kind of a toss-up whether it's simpler to create a special intrinsic for it or make it part of the general recipe for VP instructions.

Moved the discussion to the RFC.

llvm/docs/LangRef.rst
15321	A <opcode> B = { A[i] <opcode> B[i] if M[i] = T, and { undef otherwise That's an accurate description of `%evl` semantics.
15334	Yes. I'll change that to `%evl`.

rogfer01 added a subscriber: rogfer01.Nov 25 2019, 5:28 AM

dmgreen added a subscriber: dmgreen.Nov 25 2019, 2:36 PM

Clarify documentation on preserved lanes
explicit vlen arg is either negative or (new requirement) less-equal-than the number of lanes of the operation.

SjoerdMeijer mentioned this in D57504: RFC: Prototype & Roadmap for vector predication in LLVM.Jan 30 2020, 8:09 AM

Just wanted to check your plans because the status is "planned changes to this revision".
I think I am ready to LGTM this, if someone else LGTM this too, e.g. @rkruppe, but will wait if I missed something or if you're planning more changes.

Comments
Changed phrasing of %mask && %evl function along @SjoerdMeijer sketch. This covers the planned changes (%evl <= W).
NoSync change should probably go into a separate commit.

simoll marked 18 inline comments as done.Jan 31 2020, 8:19 AM

Unit tests: fail. 62371 tests passed, 3 failed and 839 were skipped.

failed: LLVM.CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll
failed: LLVM.CodeGen/AMDGPU/smrd.ll
failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 18 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Harbormaster failed remote builds in B45460: Diff 241737!Jan 31 2020, 8:47 AM

I know it's very late for me to be bringing this up, but I have a couple of broad questions -- one minor, one less minor.

First, when we introduced the constrained FP intrinsics Chandler suggested inserting ".experimental" in the names because of the likelihood that something would need to change before we were done and the "experimental" modifier would make the auto-upgrader handling slightly less ugly. That same reasoning seems like it would apply here.

Second, I'm not 100% comfortable with how the explicit vector length argument is represented. Yes, I know, I should have brought this up about a year ago. I don't have any objections to the existence of this argument. I understand that we need some way to handle it. I'm just wondering if we can handle it in some way that makes it more like an optional argument. I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right? My concern is that for targets that don't do SVE kinds of things we still have this entirely redundant, if not meaningless, operand to carry around. What would you think of this being provided as an operand bundle when it is needed?

I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right?

This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases.

In D69891#1854113, @SjoerdMeijer wrote:

I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right?

This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases.

You're mixing up different meanings of "vector length" here:

The number of elements in a scalable vector is a constant multiple of vscale which is unknown at compile time but constant during any given program execution, for both SVE and RISC-V V (though apparently not for the ISA Jacob et al. are working on, which raises all sorts of questions outside the scope of VP intrinsics so let's not get into that now)
The %evl parameter of VP intrinsics is a runtime value from 0 to (number of elements in the vector, be it scalable or not) which serves a similar purpose as the <W x i1> mask though it is tailored for a separate and complementary hardware feature. Both EVL and mask are for predication, i.e., not doing computations on some lanes of the vector depending on runtime conditions -- but those lanes still exist.

Although the EVL argument is not architecture specific in that it has the same semantics on all targets and can be legalized on all targets, Andrew is right that in practice, one would not generate vector code using the EVL argument to perform stripmining without knowing that the target architecture has efficient hardware support for it. Instead, that argument would be fixed to a constant -1 if the code does not want/need this concept. This is not too different from many other vector operations (e.g., shufflevector can express arbitrary shuffles, but many architectures support only a subset of those efficiently, and vectorization cost models take that into account). The real question is, do targets without native support for EVL have anything to gain from making the argument optional? I don't think so:

It does not make textual LLVM IR significantly more noisy: for example, a boring old 8xi32 masked store is call void @llvm.vp.store.v8i32.p0v8i32(<8 x i32> %val, <8 x i32>* %ptr, <8 x i1> %mask, i32 -1) and all you'd be saving by making the EVL argument optional is the i32 -1 part.
It is one additional operand on instruction, but given that almost all VP operations have 3+ arguments even without the EVL argument, getting rid of it seems like a pretty minor optimization (we can revisit this if VP operations become widely used and turn out to significantly affect memory consumption, but in that scenario we can make even greater gains by making them first-class instructions).
If this extra argument complicates code that uses IRBuilder or pattern matching for purposes where it should always be -1, we can have overloads that (construct/expect) a constant -1 as EVL argument. For example, builder.CreateVectorFAdd(lhs, rhs, mask) forwarding to builder.CreateVectorFAdd(lhs, rhs, mask, builder.getInt32(-1)).
It has no impact on code generation, since the canonical way to indicate that the "EVL feature" is not used by an instruction (the EVL argument being constant -1) is trivial to identify.

So I think that making this argument optional accomplishes nothing, it only makes support for targets that do care (RISC-V V, SX-Aurora VE) more complicated.

In D69891#1852795, @andrew.w.kaylor wrote:

I know it's very late for me to be bringing this up, but I have a couple of broad questions -- one minor, one less minor.

First, when we introduced the constrained FP intrinsics Chandler suggested inserting ".experimental" in the names because of the likelihood that something would need to change before we were done and the "experimental" modifier would make the auto-upgrader handling slightly less ugly. That same reasoning seems like it would apply here.

I don't mind changing the name to 'llvm.experimental.vp.add', etc . The mid to long-term goal is first-class VP instructions in any case.

Second, I'm not 100% comfortable with how the explicit vector length argument is represented. Yes, I know, I should have brought this up about a year ago. I don't have any objections to the existence of this argument. I understand that we need some way to handle it. I'm just wondering if we can handle it in some way that makes it more like an optional argument. I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right? My concern is that for targets that don't do SVE kinds of things we still have this entirely redundant, if not meaningless, operand to carry around. What would you think of this being provided as an operand bundle when it is needed?

Operand bundles are not the same as optional parameters. Operand bundles pessimistically imply side-effects where as a plain i32 argument is innocuous to optimizations:

https://llvm.org/docs/LangRef.html#operand-bundles

Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target is readnone or readonly), unless they’re overridden with callsite specific attributes.

That is reasonable for constrained fp intrinsics, which always have such a dependence. Many VP intrinsics however do not have any side effects or only inspect/modify memory through their pointer arguments, etc.

In D69891#1854257, @rkruppe wrote:

<snip>

It has no impact on code generation, since the canonical way to indicate that the "EVL feature" is not used by an instruction (the EVL argument being constant -1) is trivial to identify.

So I think that making this argument optional accomplishes nothing, it only makes support for targets that do care (RISC-V V, SX-Aurora VE) more complicated.

Yep. Thanks for clarifying this. Once LLVM supports optional parameters without adverse effects, we can re-visit this discussion until then i'd strongly prefer the EVL to be an explicit parameter.

Ah, okay, I misunderstood. Thanks for clarifying.

In D69891#1854113, @SjoerdMeijer wrote:

I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right?

This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases.

Alright, I was reaching for a more general concept that I obviously don't know the term for. What I meant to say was just that whatever generated the argument must know something specific about the target architecture.

In D69891#1854301, @simoll wrote:

Operand bundles are not the same as optional parameters. Operand bundles pessimistically imply side-effects where as a plain i32 argument is innocuous to optimizations:

https://llvm.org/docs/LangRef.html#operand-bundles

Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target is readnone or readonly), unless they’re overridden with callsite specific attributes.

That is reasonable for constrained fp intrinsics, which always have such a dependence. Many VP intrinsics however do not have any side effects or only inspect/modify memory through their pointer arguments, etc.

We could add a callsite attribute when the intrinsic is created. We have that information about the intrinsic, right?

I'll admit that there may be no practical difference between what you've proposed and what I have in mind (in the abstract sense that assumes it's even possible to implement this in the way that I'm imagining). Mostly, I'm trying to explore what I perceive to be a bad smell in the IR. Having a parameter that is irrelevant for some targets just doesn't seem right. Introducing these intrinsics with "experimental" in the name would make me feel a bit better about that, even though again it has no practical effect.

In D69891#1854257, @rkruppe wrote:

The real question is, do targets without native support for EVL have anything to gain from making the argument optional?

I'm not sure I agree that is the real question. What is gained by not having the floating point constraint arguments always present even when they aren't needed? We have values that describe the default mode, so we could use those when we want the default mode. I think we all agree that they shouldn't be there when we don't need them, yet I would argue that those arguments are more relevant when "not needed" than the evl argument is.

I'm imagining walking someone through the IR, explaining what each instruction means. I'd be a bit embarrassed when I get to a vp intrinsic and say, "Ignore the last argument. We don't use it for this target."

But like I said above, this is much more a vague indication to me that we haven't got this right yet than a concrete problem.

andrew.w.kaylor added inline comments.Feb 3 2020, 5:26 PM

llvm/docs/LangRef.rst
15288	Nothing in this description makes it clear that the value should be negative for targets that don't support the explicit vector length argument. What should happen if this value isn't negative for a target that doesn't support it?
15292	What happens if the value isn't in this range? Based on the comments below it seems that the effective mask created should be as if %evl were equal to W. If it is a non-constant value will the generated code somehow enforce that behavior?

pengfei added a subscriber: pengfei.Feb 3 2020, 7:07 PM

In D69891#1856146, @andrew.w.kaylor wrote:

In D69891#1854113, @SjoerdMeijer wrote:

I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right?

This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases.

Alright, I was reaching for a more general concept that I obviously don't know the term for. What I meant to say was just that whatever generated the argument must know something specific about the target architecture.

In D69891#1854301, @simoll wrote:

Operand bundles are not the same as optional parameters. Operand bundles pessimistically imply side-effects where as a plain i32 argument is innocuous to optimizations:

https://llvm.org/docs/LangRef.html#operand-bundles

Calls and invokes with operand bundles have unknown read / write effect on the heap on entry and exit (even if the call target is readnone or readonly), unless they’re overridden with callsite specific attributes.

That is reasonable for constrained fp intrinsics, which always have such a dependence. Many VP intrinsics however do not have any side effects or only inspect/modify memory through their pointer arguments, etc.

We could add a callsite attribute when the intrinsic is created. We have that information about the intrinsic, right?

I'll admit that there may be no practical difference between what you've proposed and what I have in mind (in the abstract sense that assumes it's even possible to implement this in the way that I'm imagining). Mostly, I'm trying to explore what I perceive to be a bad smell in the IR. Having a parameter that is irrelevant for some targets just doesn't seem right. Introducing these intrinsics with "experimental" in the name would make me feel a bit better about that, even though again it has no practical effect.

In D69891#1854257, @rkruppe wrote:

The real question is, do targets without native support for EVL have anything to gain from making the argument optional?

I'm not sure I agree that is the real question. What is gained by not having the floating point constraint arguments always present even when they aren't needed? We have values that describe the default mode, so we could use those when we want the default mode. I think we all agree that they shouldn't be there when we don't need them, yet I would argue that those arguments are more relevant when "not needed" than the evl argument is.

I'm imagining walking someone through the IR, explaining what each instruction means. I'd be a bit embarrassed when I get to a vp intrinsic and say, "Ignore the last argument. We don't use it for this target."

But like I said above, this is much more a vague indication to me that we haven't got this right yet than a concrete problem.

To me all of this points in the direction that we'd want a standard mechanism for optional parameters. We have three options for this right now:
a) Come up with a novel, clean slate scheme for optional parmeters with default values (eg constrained fp default fp env values, i32 -1 for VP intrinsics). Some examples:

; all optional parameters set (eg RISC-V V or SX-Aurora)
%x= llvm.fp.add(%a, %b) mask(%mask), evl(%evl), fpenv(fpround.tonearest, fpexcept.strict)

; constrained fp SIMD intrinsic (eg AVX512)
%y= llvm.fp.add(%a, %b) mask(%mask), fpenv(fpround.tonearest, fpexcept.strict)

; unpredicated, default fp env fp add (isomorphic to the fadd instruction)
%z= llvm.fp.add(%a, %b)

This is the solution that i'd want to see in LLVM to solve the optional parameter problem once and for all for constrained fp *and* VP (we should look out for other potential applications and start an RFC).

b) OpBundles (imply side-effects unless overriden with callsite attributes) - clunky, could form the starting point for a) though.
c) A more exotic solution: add an extra opaque parameter and provide different intrinsics to produce the opaque value (i'm not aware of this being used anywhere right now). This is similar to the idea that @sdesmalen brought up (https://reviews.llvm.org/D69891#inline-632851), however, with the produced mask having an opaque type such that it behaves basically like an optional parameter tuple, ie:

declare opaquety  @llvm.vp.mask(<8 x i1>)
declare opaquety @llvm.vp.evlmask(<8 x i1>, i32 %evl)
declare <8 x float> @llvm.vp.fadd(%a, %b, opaquety mask)

For constrained fp, this could look as follows

declare opaquety @llvm.constrainedfp.fpenv(metadata, metadata)
declare opaquety @llvm.constrained.fadd(%a, %b, opaquety fpenv)

Note that we could allow chaining of these intrinsics (eg add a passthru opquety parameter to the defining intrinsics):

opaquety %fpenv = llvm.constrainedfp.fpenv(opquety undef, fpround.toneares, fpexcept.strict)
opaquety %opt_params = llvm.vp.evlmask(%fpenv, <8 x ii1>%mask, i32 %evl)
%c = llvm.fadd(%a, %b,  opt_params)

BUT before any of these solutions are adopted, i'd rather we model the optional parameters as standard, explicit parameters with default values. We can pursue tha general solution for the optional parameter problem in parallel.
What do you think?

llvm/docs/LangRef.rst
15288	Nothing in this description makes it clear that the value should be negative for targets that don't support the explicit vector length argument. Good point. I'll add that to the description. What should happen if this value isn't negative for a target that doesn't support it? There is a lowering pass that folds `%evl` into the mask, just as for reduction intrinsics.
15292	We are discussing the case `%evl >= W` in the main RFC, right now https://reviews.llvm.org/D57504#1854330 .

I like your first proposal for optional parameters. It "feels right" to me, and I agree that it is an improvement on operand bundles. Obviously it would take some time to build consensus for something like that as a new IR construct. I can live with the explicit parameter for now if we can agree on rules for how it is defined to behave on targets where it does not apply.

Ok. This is a summary of the requested changes and i'd like to get your go ahead before "committing" them to the patch:

Define what 'W' is (that is the vector length that %evl is compared against) - for static vector types this would be the number of elements of the vector, for scalable types W == MVL and the target is responsible for defining MVL.
Define what happens when %evl > W - since we do not know now what is best, i propose we define this as UB. This leaves us some room to revisit that decision should it turn out that some form of defined behavior would be more reasonable.
Mention that %evl is discouraged for non-VL targets - TTI tells people whether VL is supported or not for every target. If used nevertheless, the ExpandVectorPredicationPass will fold %evl into the mask. "not-using-%evl" means setting %evl = W or %evl = -1.
%evl and %mask are parameters - i will start an RFC on llvm-dev towards optional function parameters (as sketched here under variant a): https://reviews.llvm.org/D69891#1856580).
Rename to llvm.experimental.vp.* - Inserting experimental is the preferred way to introduce new intrinsics until they are stable (for technical reasons brought up by @chandlerc - https://reviews.llvm.org/D69891#1852795 ).

In D69891#1871485, @simoll wrote:

Ok. This is a summary of the requested changes and i'd like to get your go ahead before "committing" them to the patch:

Define what 'W' is (that is the vector length that %evl is compared against) - for static vector types this would be the number of elements of the vector, for scalable types W == MVL and the target is responsible for defining MVL.

I object. The only sensible definition for the upper bound on %evl is the number of elements in the vector type in question (because %evl counts vector elements). We already have a concept of "number of elements in a vector type" that generalizes across all vector types (fixed-width and scalable), and it is quite target-independent: <vscale x $N x $elemtype > (where $N is a constant) has vscale * $N elements. The only target-dependent part is in the positive integer vscale, and its behavior is quite restricted. All of this is already stated in the LangRef (in the section on vector types), and there is no reason for VP intrinsics to repeat it, let alone deviate from it by e.g. introducing new terminology (MVL) or making provisions for more target-dependent behavior than actually exists. The VP instructions should just define their behavior in terms of the number of elements in the vector.

In D69891#1872419, @rkruppe wrote:

In D69891#1871485, @simoll wrote:

Ok. This is a summary of the requested changes and i'd like to get your go ahead before "committing" them to the patch:

Define what 'W' is (that is the vector length that %evl is compared against) - for static vector types this would be the number of elements of the vector, for scalable types W == MVL and the target is responsible for defining MVL.

I object. The only sensible definition for the upper bound on %evl is the number of elements in the vector type in question (because %evl counts vector elements). We already have a concept of "number of elements in a vector type" that generalizes across all vector types (fixed-width and scalable), and it is quite target-independent: <vscale x $N x $elemtype > (where $N is a constant) has vscale * $N elements. The only target-dependent part is in the positive integer vscale, and its behavior is quite restricted. All of this is already stated in the LangRef (in the section on vector types), and there is no reason for VP intrinsics to repeat it, let alone deviate from it by e.g. introducing new terminology (MVL) or making provisions for more target-dependent behavior than actually exists. The VP instructions should just define their behavior in terms of the number of elements in the vector.

+1
I was complicating things. MVL == number of vector elements and the whole discussion about what MVL is and how it is read/set really is part of the scalable type/vscale discussion.

In D69891#1871485, @simoll wrote:

Rename to llvm.experimental.vp.* - Inserting experimental is the preferred way to introduce new intrinsics until they are stable (for technical reasons brought up by @chandlerc - https://reviews.llvm.org/D69891#1852795 ).

I feel like I should mention that I don't know how Chandler feels about the use of "experimental" for these intrinsics. I wasn't trying to claim his approval of my suggestion, merely explaining that I thought the reasoning that led to the current naming of the constrained FP intrinsics probably applied here as well. After I made that suggestion @craig.topper pointed out to me that the "experimental" qualifier has a tendency to never go away. See, for example, vector add/reduce intriniscs. All this is to say that my suggestion is just a suggestion and I could be convinced to drop it if that is the consensus.

In D69891#1876867, @andrew.w.kaylor wrote:

In D69891#1871485, @simoll wrote:

Rename to llvm.experimental.vp.* - Inserting experimental is the preferred way to introduce new intrinsics until they are stable (for technical reasons brought up by @chandlerc - https://reviews.llvm.org/D69891#1852795 ).

I feel like I should mention that I don't know how Chandler feels about the use of "experimental" for these intrinsics. I wasn't trying to claim his approval of my suggestion, merely explaining that I thought the reasoning that led to the current naming of the constrained FP intrinsics probably applied here as well.

Well, to set this straight, i didn't mean to imply Chandler's approval. I just want to document what motivates the experimental prefix.

After I made that suggestion @craig.topper pointed out to me that the "experimental" qualifier has a tendency to never go away. See, for example, vector add/reduce intriniscs. All this is to say that my suggestion is just a suggestion and I could be convinced to drop it if that is the consensus.

How about we stick with llvm.vp.fadd and go for llvm.vp.fadd.v2, etc when/if the intrinsics are updated?

vkmr added a subscriber: vkmr.Feb 17 2020, 8:02 AM

In D69891#1878624, @simoll wrote:

After I made that suggestion @craig.topper pointed out to me that the "experimental" qualifier has a tendency to never go away. See, for example, vector add/reduce intriniscs. All this is to say that my suggestion is just a suggestion and I could be convinced to drop it if that is the consensus.

How about we stick with llvm.vp.fadd and go for llvm.vp.fadd.v2, etc when/if the intrinsics are updated?

I'm OK with that.

added section that %evl is disencouraged for non-evl targets
The VP intrinsic has UB if the effective EVL is out of bounds.
Use i32 -1 as special value for %evl (disabling %evl for this VP call).

Harbormaster completed remote builds in B47134: Diff 246212.Feb 24 2020, 8:22 AM

efriedma added a subscriber: efriedma.Feb 24 2020, 4:09 PM

efriedma added inline comments.

llvm/docs/LangRef.rst
15283	I assume for scalable vectors, the range of legal values for `%evl` is the runtime vector length? Maybe worth noting explicitly.
15296	I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean, we can, but there are complications for other passes like, for example, constant hoisting. We have immarg for arguments that are always immediates, but we don't have a way to mark an argument that's sometimes an immediate. I don't have any great suggestions here... but one idea is to add a second type overload to the intrinsics: `i32` if the evl is present, `{}` if the evl is absent. So `@llvm.vp.add.v4i32.i32` is an add with an evl parameter, and `@llvm.vp.add.v4i32.sl_s` is an add without an evl parameter. I think that expresses the semantics correctly, but maybe it's not so readable.
15316	"discouraged"

Thanks for the review!
I'll update shortly with an explicit mention of scalable vector types in the langref.

llvm/docs/LangRef.rst
15296	I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean, we can, but there are complications for other passes like, for example, constant hoisting. We have immarg for arguments that are always immediates, but we don't have a way to mark an argument that's sometimes an immediate. We could make `TTI:getIntIMmCostIntrin` return `0` whenever `i32 -1` is passed in VP intrinsics for evl. /// Return the expected cost of materialization for the given integer /// immediate of the specified type for a given instruction. The cost can be /// zero if the immediate can be folded into the specified instruction. int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type *Ty) const; In any case, the special handling of `i32 -1` is really a makeshift solution. Mid to long term, we want to make evl (and the mask) argument optional parameters (https://lists.llvm.org/pipermail/llvm-dev/2020-February/139110.html). one idea is to add a second type overload to the intrinsics: i32 if the evl is present, {} if the evl is absent. So @llvm.vp.add.v4i32.i32 is an add with an evl parameter, and @llvm.vp.add.v4i32.sl_s is an add without an evl parameter. This might solve the representation issue in the IR. However, doing so would break the 1:1 correspondence between instruction opcodes and VP intrinsics, which is a nice property to have when optimizing VP code (eg for the generalized pattern matching logic in the VP reference patch).

efriedma added inline comments.Feb 24 2020, 5:35 PM

llvm/docs/LangRef.rst
15296	If you have a plan for this, that's fine.

new LangRef wording
updated 'VPIntrinsic::canIgnoreVectorLengthParam` to be in line with the i32 -1 constant being the off switch for evl.

Harbormaster completed remote builds in B47175: Diff 246352.Feb 24 2020, 6:15 PM

nfc: spelling, rebased

Harbormaster completed remote builds in B47208: Diff 246444.Feb 25 2020, 7:15 AM

andrew.w.kaylor added inline comments.Feb 25 2020, 3:14 PM

llvm/lib/IR/IntrinsicInst.cpp
276	Why are you doing this and not just "less than zero"?

simoll marked an inline comment as done.Feb 25 2020, 10:06 PM

simoll added inline comments.

llvm/lib/IR/IntrinsicInst.cpp
276	Because with the new LangRef wording %evl is unsigned and the `i32 -1` constant is handled as a special case.

andrew.w.kaylor added inline comments.Mar 4 2020, 3:53 PM

llvm/lib/IR/IntrinsicInst.cpp
276	I see that you also discussed this with Eli. I don't like the fact that constant -1 and a value that is 0xFFFFFFFF at runtime could have different behaviors. There's no telling when the constant folder might find a path to convert a value to a constant. I understand that the intrsincs aren't supposed to be called with values outside the range of [0, W] and has undefined behavior if it is, but having an exception for a constant does seem right. As a completely contrived example, suppose I had IR like this: bb1: br i1 %flag, label %bb2, label %bb3 bb2: %t1 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) br label %bb4 bb3: %t2 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 -1) br label %bb4 bb4: %t3 = phi <8 x i32> [%t1, %bb1], [%t2, %bb2] ... Now some pass is going to transform that into: bb1: %evl = select i1 %flag, i32 %n, i32 -1 %t3 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %evl) ... And now the -1 case has undefined behavior.

simoll marked 4 inline comments as done.Mar 5 2020, 12:46 AM

simoll added inline comments.

llvm/lib/IR/IntrinsicInst.cpp
276	That's a problem, yes. We could do away with the `-1` special case entirely. Then, to disable evl it has to be set to the number of vector elements. For a <W x ty> vector that's just the constant `W`, for scalable vector types that would be the appropriate `vscale` const exprssion.

Rebased
removed i32 -1 special case for EVL in VP intrinsics - 0 <= %evl < W is the only valid (== defined) setting now.

Harbormaster completed remote builds in B48580: Diff 249153.Mar 9 2020, 12:25 PM

Ping. The i32 -1 special case is gone and canIgnoreVectorLength works for both regular and scalable vector types. Are there any more comments or can this go in?

I'm satisfied with the functionality, but I'm not sure about the intrinsics having undefined behavior outside the [0, W] range. The way you've implemented it, it seems like the behavior would be predictable. If the evl argument is outside that range, it is ignored. Applying an unsigned value greater than W using the "%mask AND %EVLmask" also has this effect. Why not just make that the defined behavior?

llvm/lib/IR/IntrinsicInst.cpp
292	You can get here in one step with 'this->getModule()'

In D69891#1917277, @andrew.w.kaylor wrote:

I'm satisfied with the functionality, but I'm not sure about the intrinsics having undefined behavior outside the [0, W] range. The way you've implemented it, it seems like the behavior would be predictable. If the evl argument is outside that range, it is ignored.

To directly lower VP to NEC SX-Aurora %evl strictly needs to be within the range 0 to W or you get a hardware exception. Defining any behavior outside of that range thus implies additional instructions to restrict %evl to its bounds or guard the VP op. Clearly we do not want that. At the same time un-defining the behavior outside of that range does not hamper AVX512 code generation in any way.

Applying an unsigned value greater than W using the "%mask AND %EVLmask" also has this effect.

Semantically, yes. The difference is in the code generation.

[applying.. greater than W] Why not just make that the defined behavior?

When %evl is lowered to a mask there is still a risk of overflow in the comparison when the underlying vector type is widened (consider an operation on <256 x i8> elements and %evl ==258.. when that operation is widened to <512 x i8> you need to do something about that %EVLmask or you'll get spurious active bits in the upper half). If that is UB to begin you do not need to consider it in the %EVLmask computation. So, even non-AVL targets benefit from the strict defined range for %evl.

NFC

Rebased
getModule()

Harbormaster completed remote builds in B48957: Diff 249876.Mar 12 2020, 6:29 AM

OK. Since the behavior for out-of-range evl is target-dependent, undefined makes sense.

I don't know if you were waiting for input from anyone else, but this looks good to me.

This revision is now accepted and ready to land.Mar 12 2020, 10:51 AM

In D69891#1919830, @andrew.w.kaylor wrote:

OK. Since the behavior for out-of-range evl is target-dependent, undefined makes sense.

I don't know if you were waiting for input from anyone else, but this looks good to me.

Great :) Thanks to everybody involved for reviewing!
We've iterated on this patch for a while and the last major update was weeks ago, giving people enough time to react to it. I'll commit this today.

The next patch will introduce the ExpandVectorPredicationPass, which folds %evl into the mask if requested through TLI.

In D69891#1921124, @simoll wrote:

In D69891#1919830, @andrew.w.kaylor wrote:

OK. Since the behavior for out-of-range evl is target-dependent, undefined makes sense.

I don't know if you were waiting for input from anyone else, but this looks good to me.

Great :) Thanks to everybody involved for reviewing!
We've iterated on this patch for a while and the last major update was weeks ago, giving people enough time to react to it. I'll commit this today.

Given the scale of the change, it might be good to let people on llvm-dev know that a this patch is ready to land and it is about to be committed (and wait till early next week in case there's additional feedback).

In D69891#1921183, @fhahn wrote:

In D69891#1921124, @simoll wrote:

In D69891#1919830, @andrew.w.kaylor wrote:

OK. Since the behavior for out-of-range evl is target-dependent, undefined makes sense.

I don't know if you were waiting for input from anyone else, but this looks good to me.

Great :) Thanks to everybody involved for reviewing!
We've iterated on this patch for a while and the last major update was weeks ago, giving people enough time to react to it. I'll commit this today.

Given the scale of the change, it might be good to let people on llvm-dev know that a this patch is ready to land and it is about to be committed (and wait till early next week in case there's additional feedback).

Ok, let's wait until next week to give people more time to react. I will also update the reference patch in the main RFC noting that the integer VP patch was accepted.
However, i don't think it is necessary to bring this particular patch up on llvm-dev - there were several llvm-dev mails regarding VP in the past and we should have all interested parties as subscribers to the main RFC. Future patches are a completely different story - eg the VP-expansion pass and follow up patches should be announced on llvm-dev to make sure everybody is aware of them.

Rebased
Fixed off-by-one error in the Langref (the lane at position i with i == %evl is False).

Harbormaster completed remote builds in B49122: Diff 250173.Mar 13 2020, 5:08 AM

Closed by commit rG733b3199487a: [VP,Integer,#1] Vector-predicated integer intrinsics (authored by simoll). · Explain WhyMar 19 2020, 3:12 AM

This revision was automatically updated to reflect the committed changes.

Reading through the LangRef change post commit I was struck by this wording: "The VP intrinsic has undefined behavior if `%evl > W`."

This is a very strong interpretation and doesn't match what we do for arithmetic. One consequence of that difference is that hoisting a vp intrinsic out of a loop will not be legal unless we can prove the EVL is in bounds. You probably want some variant of a poison semantic propagation instead. Unless of course, real hardware happens to fault on out of bounds EVL in which case the current semantic is what you want. :)

Otherwise, nice work.

In D69891#1931607, @reames wrote:

Reading through the LangRef change post commit I was struck by this wording: "The VP intrinsic has undefined behavior if `%evl > W`."

This is a very strong interpretation and doesn't match what we do for arithmetic. One consequence of that difference is that hoisting a vp intrinsic out of a loop will not be legal unless we can prove the EVL is in bounds. You probably want some variant of a poison semantic propagation instead. Unless of course, real hardware happens to fault on out of bounds EVL in which case the current semantic is what you want. :)

Otherwise, nice work.

I brought this up in D57504 earlier, and @simoll said that some real hardware indeed faults in this case:

The VE target strictly requires VL <= MVL or you'll get a hardware exception.

lkcl added a subscriber: lkcl.Mar 20 2020, 3:26 AM

Revision Contents

Path

Size

llvm/

docs/

LangRef.rst

699 lines

include/

llvm/

Analysis/

TargetTransformInfo.h

14 lines

TargetTransformInfoImpl.h

4 lines

IR/

IntrinsicInst.h

42 lines

Intrinsics.td

77 lines

VPIntrinsics.def

84 lines

lib/

IR/

IntrinsicInst.cpp

113 lines

test/

Verifier/

vp-intrinsics.ll

34 lines

unittests/

IR/

CMakeLists.txt

1 line

VPIntrinsicTest.cpp

150 lines

utils/

TableGen/

CodeGenIntrinsics.h

3 lines

CodeGenTarget.cpp

6 lines

IntrinsicEmitter.cpp

13 lines

Diff 246352

llvm/docs/LangRef.rst

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 7,707 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var		<result> = fadd float 4.0, %var ; yields float:result = 4.0 + %var

		.. _i_sub:

'``sub``' Instruction		'``sub``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 79 Lines • ▼ Show 20 Lines
Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var		<result> = fsub float 4.0, %var ; yields float:result = 4.0 - %var
<result> = fsub float -0.0, %val ; yields float:result = -%var		<result> = fsub float -0.0, %val ; yields float:result = -%var

		.. _i_mul:

'``mul``' Instruction		'``mul``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var		<result> = fmul float 4.0, %var ; yields float:result = 4.0 * %var

		.. _i_udiv:

'``udiv``' Instruction		'``udiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 30 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = udiv i32 4, %var ; yields i32:result = 4 / %var		<result> = udiv i32 4, %var ; yields i32:result = 4 / %var

		.. _i_sdiv:

'``sdiv``' Instruction		'``sdiv``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var		<result> = fdiv float 4.0, %var ; yields float:result = 4.0 / %var

		.. _i_urem:

'``urem``' Instruction		'``urem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 28 Lines

Example:		Example:
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = urem i32 4, %var ; yields i32:result = 4 % %var		<result> = urem i32 4, %var ; yields i32:result = 4 % %var

		.. _i_srem:

'``srem``' Instruction		'``srem``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines
-------------------------		-------------------------

Bitwise binary operators are used to do various forms of bit-twiddling		Bitwise binary operators are used to do various forms of bit-twiddling
in a program. They are generally very efficient instructions and can		in a program. They are generally very efficient instructions and can
commonly be strength reduced from other instructions. They require two		commonly be strength reduced from other instructions. They require two
operands of the same type, execute an operation on them, and produce a		operands of the same type, execute an operation on them, and produce a
single value. The resulting value is the same type as its operands.		single value. The resulting value is the same type as its operands.

		.. _i_shl:

'``shl``' Instruction		'``shl``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 36 Lines
.. code-block:: text		.. code-block:: text

<result> = shl i32 4, %var ; yields i32: 4 << %var		<result> = shl i32 4, %var ; yields i32: 4 << %var
<result> = shl i32 4, 2 ; yields i32: 16		<result> = shl i32 4, 2 ; yields i32: 16
<result> = shl i32 1, 10 ; yields i32: 1024		<result> = shl i32 1, 10 ; yields i32: 1024
<result> = shl i32 1, 32 ; undefined		<result> = shl i32 1, 32 ; undefined
<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>		<result> = shl <2 x i32> < i32 1, i32 1>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 2, i32 4>

		.. _i_lshr:


'``lshr``' Instruction		'``lshr``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines	.. code-block:: text

<result> = lshr i32 4, 1 ; yields i32:result = 2		<result> = lshr i32 4, 1 ; yields i32:result = 2
<result> = lshr i32 4, 2 ; yields i32:result = 1		<result> = lshr i32 4, 2 ; yields i32:result = 1
<result> = lshr i8 4, 3 ; yields i8:result = 0		<result> = lshr i8 4, 3 ; yields i8:result = 0
<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F		<result> = lshr i8 -2, 1 ; yields i8:result = 0x7F
<result> = lshr i32 1, 32 ; undefined		<result> = lshr i32 1, 32 ; undefined
<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>		<result> = lshr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 2> ; yields: result=<2 x i32> < i32 0x7FFFFFFF, i32 1>

		.. _i_ashr:

'``ashr``' Instruction		'``ashr``' Instruction
^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 34 Lines	.. code-block:: text

<result> = ashr i32 4, 1 ; yields i32:result = 2		<result> = ashr i32 4, 1 ; yields i32:result = 2
<result> = ashr i32 4, 2 ; yields i32:result = 1		<result> = ashr i32 4, 2 ; yields i32:result = 1
<result> = ashr i8 4, 3 ; yields i8:result = 0		<result> = ashr i8 4, 3 ; yields i8:result = 0
<result> = ashr i8 -2, 1 ; yields i8:result = -1		<result> = ashr i8 -2, 1 ; yields i8:result = -1
<result> = ashr i32 1, 32 ; undefined		<result> = ashr i32 1, 32 ; undefined
<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>		<result> = ashr <2 x i32> < i32 -2, i32 4>, < i32 1, i32 3> ; yields: result=<2 x i32> < i32 -1, i32 0>

		.. _i_and:

'``and``' Instruction		'``and``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines
""""""""		""""""""

.. code-block:: text		.. code-block:: text

<result> = and i32 4, %var ; yields i32:result = 4 & %var		<result> = and i32 4, %var ; yields i32:result = 4 & %var
<result> = and i32 15, 40 ; yields i32:result = 8		<result> = and i32 15, 40 ; yields i32:result = 8
<result> = and i32 4, 8 ; yields i32:result = 0		<result> = and i32 4, 8 ; yields i32:result = 0

		.. _i_or:

'``or``' Instruction		'``or``' Instruction
^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

Show All 33 Lines
""""""""		""""""""

::		::

<result> = or i32 4, %var ; yields i32:result = 4 \| %var		<result> = or i32 4, %var ; yields i32:result = 4 \| %var
<result> = or i32 15, 40 ; yields i32:result = 47		<result> = or i32 15, 40 ; yields i32:result = 47
<result> = or i32 4, 8 ; yields i32:result = 12		<result> = or i32 4, 8 ; yields i32:result = 12

		.. _i_xor:

'``xor``' Instruction		'``xor``' Instruction
^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^

Syntax:		Syntax:
"""""""		"""""""

::		::

▲ Show 20 Lines • Show All 6,787 Lines • ▼ Show 20 Lines
""""""""""		""""""""""

On some architectures the address of the code to be executed needs to be		On some architectures the address of the code to be executed needs to be
different than the address where the trampoline is actually stored. This		different than the address where the trampoline is actually stored. This
intrinsic returns the executable address corresponding to ``tramp``		intrinsic returns the executable address corresponding to ``tramp``
after performing the required machine specific adjustments. The pointer		after performing the required machine specific adjustments. The pointer
returned can then be :ref:`bitcast and executed <int_trampoline>`.		returned can then be :ref:`bitcast and executed <int_trampoline>`.


		.. _int_vp:

		Vector Predication Intrinsics
		-----------------------------
		VP intrinsics are intended for predicated SIMD/vector code. A typical VP
		operation takes a vector mask and an explicit vector length parameter as in:

		::

		<W x T> llvm.vp.<opcode>.*(<W x T> %x, <W x T> %y, <W x i1> %mask, i32 %evl)
		sdesmalenUnsubmitted Done Reply Inline Actions I expect this to also work for scalable vectors, so maybe add a case here for `<vscale x W x T>` as well? sdesmalen: I expect this to also work for scalable vectors, so maybe add a case here for `<vscale x W x…
		jdoerfertUnsubmitted Done Reply Inline Actions If that support is added later, I would add the wording later too. jdoerfert: If that support is added later, I would add the wording later too.
		sdesmalenUnsubmitted Done Reply Inline Actions @simoll is anything special needed to support scalable vectors in this context? sdesmalen: @simoll is anything special needed to support scalable vectors in this context?
		simollAuthorUnsubmitted Done Reply Inline Actions @simoll is anything special needed to support scalable vectors in this context? The intrinsics support scalable vectors out of the box. I've added VP intrinsics with scalable types to the individual operations below. simoll: > @simoll is anything special needed to support scalable vectors in this context? The…
		efriedmaUnsubmitted Done Reply Inline Actions I assume for scalable vectors, the range of legal values for `%evl` is the runtime vector length? Maybe worth noting explicitly. efriedma: I assume for scalable vectors, the range of legal values for `%evl` is the runtime vector…

		The vector mask parameter (%mask) always has a vector of `i1` type, for example
		sdesmalenUnsubmitted Done Reply Inline Actions nit: bit -> i1 sdesmalen: nit: bit -> i1
		`<32 x i1>`. The explicit vector length parameter always has the type `i32` and
		is an unsigned integer value. The explicit vector length parameter (%evl) is either
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: instead of talking about the MSB being zero, is it simpler to say that it is effective if is a positive number (if that's what you mean)? SjoerdMeijer: Nit: instead of talking about the MSB being zero, is it simpler to say that it is effective if…
		simollAuthorUnsubmitted Done Reply Inline Actions Good point. I'll rewrite that part referring to the vlen as a signed integer. simoll: Good point. I'll rewrite that part referring to the vlen as a signed integer.
		sdesmalenUnsubmitted Done Reply Inline Actions Thanks, I wondered something similar. I had actually expected the `%evl` to only make sense in the context of scalable vectors? Is it worth expressing what it means if `%evl` is larger than the size of the vector? sdesmalen: Thanks, I wondered something similar. I had actually expected the `%evl` to only make sense…
		simollAuthorUnsubmitted Done Reply Inline Actions I had actually expected the %evl to only make sense in the context of scalable vectors? EVL does not require scalable vectors. We'll use it with the `<256 x double>` type for example. simoll: > I had actually expected the %evl to only make sense in the context of scalable vectors? EVL…
		the IR constant ``i32 -1`` or %evl is in the range
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Nothing in this description makes it clear that the value should be negative for targets that don't support the explicit vector length argument. What should happen if this value isn't negative for a target that doesn't support it? andrew.w.kaylor: Nothing in this description makes it clear that the value should be negative for targets that…
		simollAuthorUnsubmitted Done Reply Inline Actions Nothing in this description makes it clear that the value should be negative for targets that don't support the explicit vector length argument. Good point. I'll add that to the description. What should happen if this value isn't negative for a target that doesn't support it? There is a lowering pass that folds `%evl` into the mask, just as for reduction intrinsics. simoll: > Nothing in this description makes it clear that the value should be negative for targets that…

		sdesmalenUnsubmitted Done Reply Inline Actions nit: 80 char limit (also for other places in this file) sdesmalen: nit: 80 char limit (also for other places in this file)
		::

		0 <= %evl <= W, where W is the number of vector elements
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions What happens if the value isn't in this range? Based on the comments below it seems that the effective mask created should be as if %evl were equal to W. If it is a non-constant value will the generated code somehow enforce that behavior? andrew.w.kaylor: What happens if the value isn't in this range? Based on the comments below it seems that the…
		simollAuthorUnsubmitted Done Reply Inline Actions We are discussing the case `%evl >= W` in the main RFC, right now https://reviews.llvm.org/D57504#1854330 . simoll: We are discussing the case `%evl >= W` in the main RFC, right now https://reviews.llvm.

		Note that for :ref:`scalable vector types <t_vector>` ``W`` is the runtime
		length of the vector.

		efriedmaUnsubmitted Not Done Reply Inline Actions I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean, we can, but there are complications for other passes like, for example, constant hoisting. We have immarg for arguments that are always immediates, but we don't have a way to mark an argument that's sometimes an immediate. I don't have any great suggestions here... but one idea is to add a second type overload to the intrinsics: `i32` if the evl is present, `{}` if the evl is absent. So `@llvm.vp.add.v4i32.i32` is an add with an evl parameter, and `@llvm.vp.add.v4i32.sl_s` is an add without an evl parameter. I think that expresses the semantics correctly, but maybe it's not so readable. efriedma: I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean, we…
		simollAuthorUnsubmitted Done Reply Inline Actions I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean, we can, but there are complications for other passes like, for example, constant hoisting. We have immarg for arguments that are always immediates, but we don't have a way to mark an argument that's sometimes an immediate. We could make `TTI:getIntIMmCostIntrin` return `0` whenever `i32 -1` is passed in VP intrinsics for evl. /// Return the expected cost of materialization for the given integer /// immediate of the specified type for a given instruction. The cost can be /// zero if the immediate can be folded into the specified instruction. int getIntImmCostIntrin(Intrinsic::ID IID, unsigned Idx, const APInt &Imm, Type Ty) const; In any case, the special handling of `i32 -1` is really a makeshift solution. Mid to long term, we want to make evl (and the mask) argument optional parameters (https://lists.llvm.org/pipermail/llvm-dev/2020-February/139110.html). one idea is to add a second type overload to the intrinsics: i32 if the evl is present, {} if the evl is absent. So @llvm.vp.add.v4i32.i32 is an add with an evl parameter, and @llvm.vp.add.v4i32.sl_s is an add without an evl parameter. This might solve the representation issue in the IR. However, doing so would break the 1:1 correspondence between instruction opcodes and VP intrinsics, which is a nice property to have when optimizing VP code (eg for the generalized pattern matching logic in the VP reference patch). simoll:* > I'm not happy with distinguishing between %evl equal to -1, vs. the IR constant -1. I mean…
		efriedmaUnsubmitted Not Done Reply Inline Actions If you have a plan for this, that's fine. efriedma: If you have a plan for this, that's fine.
		The VP intrinsic has undefined behavior for any other setting of %evl, for
		example if ``%evl > W``. The explicit vector length (%evl) is only effective
		when it is not the IR constant ``i32 -1``, and, when that is the case, it
		creates a mask, %EVLmask, with all elements ``0 <= i <= %evl`` set to True, and
		all other lanes ``%evl < i <= W`` to False. A new mask %M is calculated with an
		element-wise AND from %mask and %EVLmask:

		::

		M = %mask AND %EVLmask

		A vector operation ``<opcode>`` on vectors ``A`` and ``B`` calculates:

		::

		A <opcode> B = { A[i] <opcode> B[i] M[i] = True, and
		{ undef otherwise

		Optimization Hint
		^^^^^^^^^^^^^^^^^
		efriedmaUnsubmitted Done Reply Inline Actions "discouraged" efriedma: "discouraged"

		Some targets, such as AVX512, do not support the %evl parameter in hardware.
		The use of an effective %evl is discouraged for those targets. The function
		``TargetTransformInfo::hasActiveVectorLength()`` returns true when the target
		has native support for %evl.
		sdesmalenUnsubmitted Done Reply Inline Actions Have you considered adding an extra argument to specify the value of the false lanes, similar to how this is currently done for `llvm.masked.load`? By passing `undef`, it would have similar behaviour as the current definition, yet it would remove the need to add explicit `select` statements for handling merging- or zeroing predication if an instruction supports it. For SVE for example, most instructions have either merging or zeroing predication and the intrinsics expose the merging/zeroing/undef predication directly in the C/C++ intrinsic API. I think it would be really nice to have that capability represented in the vp intrinsics as well. sdesmalen: Have you considered adding an extra argument to specify the value of the false lanes, similar…
		simollAuthorUnsubmitted Done Reply Inline Actions Yes i considered that. However, as you suggested you can express this without loss with a `select` and i'd like to keep the intrinsics simple. simoll: Yes i considered that. However, as you suggested you can express this without loss with a…
		jdoerfertUnsubmitted Done Reply Inline Actions FWIW, I like the idea of simple intrinsics and explicit selection. jdoerfert: FWIW, I like the idea of simple intrinsics and explicit selection.
		sdesmalenUnsubmitted Done Reply Inline Actions If we want to keep the vp intrinsics simple, shouldn't we reconsider the need for `%evl` as an explicit parameter? That value can be folded into the predicate using a special intrinsic that enables the lanes upto `%evl`, e.g. `<256 x i1> llvm.vp.enable.lanes(<256 x i1> %p, i32 %evl)` If any operation needs `%evl` as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects. sdesmalen: If we want to keep the vp intrinsics simple, shouldn't we reconsider the need for `%evl` as an…
		simollAuthorUnsubmitted Done Reply Inline Actions If any operation needs %evl as an explicit parameter, it would be a matter of having a pattern that matches the enable.lanes intrinsic, similar to how merging/zeroing predication can be implemented by matching the selects. The SX-Aurora and RiSC-V V extension natively support an explicit vector length parameter just as they support predicated vector. Mask and EVL should be treated the same: if there is a vector mask parameter there should also be an EVL parameter. simoll: > If any operation needs %evl as an explicit parameter, it would be a matter of having a…
		sdesmalenUnsubmitted Done Reply Inline Actions I understand that certain instructions take an explicit vector length parameter, but that doesn't really explain why an arch-independent intrinsic needs to expose this. Is there any reason that modelling this using something like `llvm.vp.enable.lanes` won't work? sdesmalen: I understand that certain instructions take an explicit vector length parameter, but that…
		simollAuthorUnsubmitted Done Reply Inline Actions That's already two architectures that do support EVL on all vector instructions and it is vital to model it. EVL has performance implications (smaller EVL -> lower latency; no need to compute and store a mask where EVL can be used) . For SVE/MVE, you can simply turn the feature off by passing `i32 -1`. From the perspective of RVV and SX-Aurora, saying we do not need EVL is equivalent to saying we do not need vector predication. simoll: That's already two architectures that do support EVL on all vector instructions and it is…
		sdesmalenUnsubmitted Done Reply Inline Actions If we'd use a `select` instruction to implement the zeroing/merging predication, the issue with the current set of intrinsics is that we don't know how many lanes to select because the predicate mask is no longer the determining factor to limit which lanes are the valid lanes. Suggesting that other architectures can just turn off the feature by setting the `%evl` parameter to `-1` doesn't change the fact that the code generator still needs to handle the IR if `%evl` is not `-1`. It would be valid IR and there is nothing that guarantees LLVM won't generate the generic intrinsics with a value that is not `-1`. This is not really special to `select`, but common to other operations that take a mask as well. For our case it means we can't implement zeroing/merging predication with this set of intrinsics using `select`. If we want to solve the `select` issue and also keep the intrinsics simple, my suggestion was to combine the explicit vector length with the mask using an explicit intrinsic like `@llvm.vp.enable.lanes`. Because this is an explicit intrinsic, the code-generator can simply extract the `%evl` parameter and pass that directly to the instructions for RVV/SXA. This is what happens for many other intrinsics in LLVM already, like `masked.load`/`masked.gather` that support only a single addressing mode, where it is up to the code-generator to pick apart the value into operands that are suited for a more optimal load instruction. Without having heard your thoughts on this suggestion, I would have to guess that your reservation is the possibility of LLVM hoisting/separating the logic that merges predicate mask and `%evl` value in some way. That would mean having to do some tricks (think CodeGenPrep) to keep the values together and recognizable for CodeGen. And that's the exact same thing we would like to avoid for supporting merging/zeroing predication, hence the suggestion for the explicit `passthru` parameter. sdesmalen: If we'd use a `select` instruction to implement the zeroing/merging predication, the issue with…
		simollAuthorUnsubmitted Done Reply Inline Actions Suggesting that other architectures can just turn off the feature by setting the %evl parameter to -1 doesn't change the fact that the code generator still needs to handle the IR if %evl is not -1 Other targets do not need to consider `%evl` because there will be an `ExpandVectorPredication` pass that folds the EVL into the vector mask. Backends can request through TTI that the `%evl` be folded away and ignore that parameter from there on. An early version of that pass is included in the VP reference patch. The TTI changes and the expansion pass are planned for the next patch after this one. simoll: > Suggesting that other architectures can just turn off the feature by setting the %evl…
		cameron.mcinallyUnsubmitted Done Reply Inline Actions Yes i considered that. However, as you suggested you can express this without loss with a select and i'd like to keep the intrinsics simple. Having an explicit `passthru` is a good idea. Keeping the select glued to the intrinsics may be difficult. For example: `select %mask, %r, undef` It would be easy for a select peep to replace this with %r, since the masked bits are undefined. But for a situation like a trapping instruction, we wouldn't want the the masked elements to execute. cameron.mcinally: > Yes i considered that. However, as you suggested you can express this without loss with a…
		simollAuthorUnsubmitted Done Reply Inline Actions select %mask, %r, undef This is the current behavior - the values on masked-off lanes (by EVL or the mask) are `undef`. My problem with passthru is that it encourages people to specify arbitrary values on masked-off lanes and to do so often. Yes, for some combination of target architecture, instruction, passthru value and type it might be a perfect match for one machine instruction . But, for most combinations you will have to emit an extra `select` anyway. Besides, if the select is explicit and can be folded into another operation to simplify it, then that is actually a good thing. But there is more, for things like coalescing predicated registers and vector register spilling, it is helpful to have undef-on-masked-off and a separate select instruction that can be re-scheduled. simoll: > > select %mask, %r, undef This is the current behavior - the values on masked-off lanes (by…
		cameron.mcinallyUnsubmitted Done Reply Inline Actions Yes, you're right. Having an implicit undef and explicit merge is good. cameron.mcinally: Yes, you're right. Having an implicit undef and explicit merge is good.
		sdesmalenUnsubmitted Done Reply Inline Actions My problem with passthru is that it encourages people to specify arbitrary values on masked-off lanes and to do so often. Yes, for some combination of target architecture, instruction, passthru value and type it might be a perfect match for one machine instruction . But, for most combinations you will have to emit an extra select anyway. The most common passthru values in practice will be one of the source operands or zero/undef though. These are supported by SVE/SVE2 as a first-class feature (and I could imagine other/future vector architectures having special support for that as well). Other values can indeed be lowered to an explicit select instruction. But there is more, for things like coalescing predicated registers and vector register spilling, it is helpful to have undef-on-masked-off and a separate select instruction that can be re-scheduled. At that point the intrinsic will already be lowered to individual instructions, so I don't think the definition of the intrinsic has much of an impact on that. sdesmalen: > My problem with passthru is that it encourages people to specify arbitrary values on masked…
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Don't think this discussion reached consensus, and I missed the nuances here earlier / changed my mind on it. The prior art on this also uses an explicit `passthru`, and it looks more generic than patching that up with an `select` later; you can always pass an `undef` in? I also read the description of %evl on line 14664 above again, and thought the definition could be tightened a bit, i.e. specify an upperbound, also just for clarity: The explicit vector length parameter always has the type `i32` and is a signed integer value. The explicit vector length is only effective if it is non-negative. Results are only computed for enabled lanes. A lane is enabled if the mask at that position is true and, if effective, where the lane position is below the explicit vector length. And then started drafting something to see if I could describe the proposed behaviour: The explicit vector length parameter always has the type `i32` and is a signed integer value. The explicit vector length (%evl) is only effective if it is non-negative, and when that is the case, the effective vector length is used to calculate a mask, and its value is: 0 <= evl <= W, where W is the vector length. This creates a mask, `%EVLmask`, with all elements `0 <= i <= evl` set to `True`, and all other lanes `evl < i <= W` to `False`. A new mask `%M` is calculated with an element-wise AND from `%mask` and `%EVLmask`: M = Vmask AND EVLmask A vector operation `<opcode>` on vectors `A` and `B` calculates: A <opcode> B = { A[i] <opcode> B[i] if M[i] = T, and { undef otherwise and if I'm not mistaken we are now discussing if `undef` here should be `undef` or a `passthru` SjoerdMeijer: Don't think this discussion reached consensus, and I missed the nuances here earlier / changed…
		rkruppeUnsubmitted Not Done Reply Inline Actions Sjoerd's summary matches my understanding of the question here. I believe so far nobody spelled out that there's actually a material difference between a `passthru` argument and a separate `select`. The mask that is readily available for the `select` (the `%mask` passed to the VP instruction) isn't the same as the "enabled lanes" computed from mask and EVL (`%M` in Sjoerd's exposition above), which is extremely undesirable to materialize on machines with a native equipvalent of EVL (such as SX-Aurora and RISC-V). In particular, if `%mask` is computed by another VP instruction, then lanes of `%mask` past the relevant EVL are undef, not false. That means a select like `select %mask, %v, zeroinitializer` does not actually make the disabled lanes zero, past EVL they are again undef (assuming `%x` and `%mask` were computed by VP instructions with the same EVL). In contrast, a passthru value would (presumably, I didn't see anyone object) apply to all disabled lanes. I don't know how much this matters, since usually consumers of the vector will also only work up to the same EVL. But there are exceptions, such as the code pattern for vectorizing (associative or `-ffast-math`) reductions for RISC-V V. Briefly put, that pattern is to accumulate a vector of partial (e.g.) sums in the loop, preserving lanes past the current EVL (which may be smaller than the EVL in some previous iterations) so they'll still participate in the final reduction after the loop. This can be expressed perfectly with `passthru`. Without `passthru`, that kind of code seems to require juggling two EVLs (the actual current EVL for this iteration, and the maximum EVL across all iterations) as well as materializing what Sjoerd called `%EVLmask` and `%M` above. This seems much harder for a backend to pattern-match and turn into acceptable code than the `select %m, vp.something(%a, %b, %m, -1), zeroinitializer` pattern. I previously felt that `passthru` would be nice to have for backend maintainers (including myself) but perhaps not worth the duplication of IR functionality (having two ways to do selects). However, given the differences I just described, I don't think "just use `select`" is workable. That doesn't necessarily mean we need `passthru` arguments on every VP instruction, I could also imagine adding a `vp.passthru(%v, %passthru, %mask, %evl)` intrinsic that takes from the first operand for all enabled lanes and the second for the other lanes. But I feel like once the floodgates are opened for doing merging "differently", it's kind of a toss-up whether it's simpler to create a special intrinsic for it or make it part of the general recipe for VP instructions. rkruppe: Sjoerd's summary matches my understanding of the question here. I believe so far nobody…
		simollAuthorUnsubmitted Done Reply Inline Actions A <opcode> B = { A[i] <opcode> B[i] if M[i] = T, and { undef otherwise That's an accurate description of `%evl` semantics. simoll: A <opcode> B = { A[i] <opcode> B[i] if M[i] = T, and { undef…


		.. _int_vp_add:

		'``llvm.vp.add.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		SjoerdMeijerUnsubmitted Done Reply Inline Actions nit: perhaps more consistent `%avl` -> `%evl` ? SjoerdMeijer: nit: perhaps more consistent `%avl` -> `%evl` ?
		simollAuthorUnsubmitted Done Reply Inline Actions Yes. I'll change that to `%evl`. simoll: Yes. I'll change that to `%evl`.
		declare <16 x i32> @llvm.vp.add.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.add.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.add.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer addition of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.add``' intrinsic performs integer addition (:ref:`add <i_add>`)
		of the first and second vector operand on each enabled lane. The result on
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: please ignore if you disagree, but perhaps but perhaps shorter/simpler is: The first two operands and the result have the same vector of integer type. -> The operands and the result are integer vector types. (this applies to all/most descriptions) SjoerdMeijer: Nit: please ignore if you disagree, but perhaps but perhaps shorter/simpler is: The first…
		simollAuthorUnsubmitted Done Reply Inline Actions I'd like to shorten that. However, there are always mask and vlen operands so we cannot say that all operands have an integer vector type. simoll: I'd like to shorten that. However, there are always mask and vlen operands so we cannot say…
		disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.add.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = add <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef

		.. _int_vp_sub:

		'``llvm.vp.sub.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.sub.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.sub.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.sub.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer subtraction of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.sub``' intrinsic performs integer subtraction
		(:ref:`sub <i_sub>`) of the first and second vector operand on each enabled
		lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.sub.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = sub <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_mul:

		'``llvm.vp.mul.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.mul.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.mul.nxv46i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.mul.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated integer multiplication of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""
		The '``llvm.vp.mul``' intrinsic performs integer multiplication
		(:ref:`mul <i_mul>`) of the first and second vector operand on each enabled
		lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.mul.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = mul <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_sdiv:

		'``llvm.vp.sdiv.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.sdiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.sdiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.sdiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated, signed division of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.sdiv``' intrinsic performs signed division (:ref:`sdiv <i_sdiv>`)
		of the first and second vector operand on each enabled lane. The result on
		disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.sdiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = sdiv <4 x i32> %a, %b
		%also.r = select <4 x ii> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_udiv:

		'``llvm.vp.udiv.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.udiv.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.udiv.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.udiv.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated, unsigned division of two vectors of integers.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The third operand is the vector mask and has the same number of elements as the result vector type. The fourth operand is the explicit vector length of the operation.

		Semantics:
		""""""""""

		The '``llvm.vp.udiv``' intrinsic performs unsigned division
		(:ref:`udiv <i_udiv>`) of the first and second vector operand on each enabled
		lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.udiv.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = udiv <4 x i32> %a, %b
		%also.r = select <4 x ii> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_srem:

		'``llvm.vp.srem.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.srem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.srem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.srem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated computations of the signed remainder of two integer vectors.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.srem``' intrinsic computes the remainder of the signed division
		(:ref:`srem <i_srem>`) of the first and second vector operand on each enabled
		lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.srem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = srem <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef



		.. _int_vp_urem:

		'``llvm.vp.urem.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.urem.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.urem.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.urem.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Predicated computation of the unsigned remainder of two integer vectors.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.urem``' intrinsic computes the remainder of the unsigned division
		(:ref:`urem <i_urem>`) of the first and second vector operand on each enabled
		lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.urem.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = urem <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_ashr:

		'``llvm.vp.ashr.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.ashr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.ashr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.ashr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated arithmetic right-shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.ashr``' intrinsic computes the arithmetic right shift
		(:ref:`ashr <i_ashr>`) of the first operand by the second operand on each
		enabled lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.ashr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = ashr <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_lshr:


		'``llvm.vp.lshr.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.lshr.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.lshr.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.lshr.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated logical right-shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.lshr``' intrinsic computes the logical right shift
		(:ref:`lshr <i_lshr>`) of the first operand by the second operand on each
		enabled lane. The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.lshr.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = lshr <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_shl:

		'``llvm.vp.shl.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.shl.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.shl.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.shl.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated left shift.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.shl``' intrinsic computes the left shift (:ref:`shl <i_shl>`) of
		the first operand by the second operand on each enabled lane. The result on
		disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.shl.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = shl <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_or:

		'``llvm.vp.or.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.or.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.or.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.or.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated or.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.or``' intrinsic performs a bitwise or (:ref:`or <i_or>`) of the
		first two operands on each enabled lane. The result on disabled lanes is
		undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.or.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = or <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_and:

		'``llvm.vp.and.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.and.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.and.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.and.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated and.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.and``' intrinsic performs a bitwise and (:ref:`and <i_or>`) of
		the first two operands on each enabled lane. The result on disabled lanes is
		undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.and.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = and <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


		.. _int_vp_xor:

		'``llvm.vp.xor.*``' Intrinsics
		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		Syntax:
		"""""""
		This is an overloaded intrinsic.

		::

		declare <16 x i32> @llvm.vp.xor.v16i32 (<16 x i32> <left_op>, <16 x i32> <right_op>, <16 x i1> <mask>, i32 <vector_length>)
		declare <vscale x 4 x i32> @llvm.vp.xor.nxv4i32 (<vscale x 4 x i32> <left_op>, <vscale x 4 x i32> <right_op>, <vscale x 4 x i1> <mask>, i32 <vector_length>)
		declare <256 x i64> @llvm.vp.xor.v256i64 (<256 x i64> <left_op>, <256 x i64> <right_op>, <256 x i1> <mask>, i32 <vector_length>)

		Overview:
		"""""""""

		Vector-predicated, bitwise xor.


		Arguments:
		""""""""""

		The first two operands and the result have the same vector of integer type. The
		third operand is the vector mask and has the same number of elements as the
		result vector type. The fourth operand is the explicit vector length of the
		operation.

		Semantics:
		""""""""""

		The '``llvm.vp.xor``' intrinsic performs a bitwise xor (:ref:`xor <i_xor>`) of
		the first two operands on each enabled lane.
		The result on disabled lanes is undefined.

		Examples:
		"""""""""

		.. code-block:: llvm

		%r = call <4 x i32> @llvm.vp.xor.v4i32(<4 x i32> %a, <4 x i32> %b, <4 x i1> %mask, i32 %evl)
		;; For all lanes below %evl, %r is lane-wise equivalent to %also.r

		%t = xor <4 x i32> %a, %b
		%also.r = select <4 x i1> %mask, <4 x i32> %t, <4 x i32> undef


.. _int_mload_mstore:		.. _int_mload_mstore:

Masked Vector Load and Store Intrinsics		Masked Vector Load and Store Intrinsics
---------------------------------------		---------------------------------------

LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.		LLVM provides intrinsics for predicated vector load and store operations. The predicate is specified by a mask operand, which holds one bit per vector element, switching the associated vector lane on or off. The memory addresses corresponding to the "off" lanes are not accessed. When all bits of the mask are on, the intrinsic is identical to a regular vector load or store. When all bits are off, no memory is accessed.

.. _int_mload:		.. _int_mload:
▲ Show 20 Lines • Show All 3,674 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 1,147 Lines • ▼ Show 20 Lines
/// \returns True if the target wants to expand the given reduction intrinsic		/// \returns True if the target wants to expand the given reduction intrinsic
/// into a shuffle sequence.		/// into a shuffle sequence.
bool shouldExpandReduction(const IntrinsicInst *II) const;		bool shouldExpandReduction(const IntrinsicInst *II) const;

/// \returns the size cost of rematerializing a GlobalValue address relative		/// \returns the size cost of rematerializing a GlobalValue address relative
/// to a stack reload.		/// to a stack reload.
unsigned getGISelRematGlobalCost() const;		unsigned getGISelRematGlobalCost() const;

		/// \name Vector Predication Information
		/// @{
		/// Whether the target supports the %evl parameter of VP intrinsic efficiently in hardware.
		/// (see LLVM Language Reference - "Vector Predication Intrinsics")
		/// Use of %evl is disencouraged when that is not the case.
		bool hasActiveVectorLength() const;

		/// @}

/// @}		/// @}

private:		private:
/// Estimate the latency of specified instruction.		/// Estimate the latency of specified instruction.
/// Returns 1 as the default value.		/// Returns 1 as the default value.
int getInstructionLatency(const Instruction *I) const;		int getInstructionLatency(const Instruction *I) const;

/// Returns the expected throughput cost of the instruction.		/// Returns the expected throughput cost of the instruction.
▲ Show 20 Lines • Show All 231 Lines • ▼ Show 20 Lines	virtual unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,		virtual unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize,
unsigned ChainSizeInBytes,		unsigned ChainSizeInBytes,
VectorType *VecTy) const = 0;		VectorType *VecTy) const = 0;
virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,		virtual bool useReductionIntrinsic(unsigned Opcode, Type *Ty,
ReductionFlags) const = 0;		ReductionFlags) const = 0;
virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;		virtual bool shouldExpandReduction(const IntrinsicInst *II) const = 0;
virtual unsigned getGISelRematGlobalCost() const = 0;		virtual unsigned getGISelRematGlobalCost() const = 0;
		virtual bool hasActiveVectorLength() const = 0;
virtual int getInstructionLatency(const Instruction *I) = 0;		virtual int getInstructionLatency(const Instruction *I) = 0;
};		};

template <typename T>		template <typename T>
class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {		class TargetTransformInfo::Model final : public TargetTransformInfo::Concept {
T Impl;		T Impl;

public:		public:
▲ Show 20 Lines • Show All 468 Lines • ▼ Show 20 Lines	public:
bool shouldExpandReduction(const IntrinsicInst *II) const override {		bool shouldExpandReduction(const IntrinsicInst *II) const override {
return Impl.shouldExpandReduction(II);		return Impl.shouldExpandReduction(II);
}		}

unsigned getGISelRematGlobalCost() const override {		unsigned getGISelRematGlobalCost() const override {
return Impl.getGISelRematGlobalCost();		return Impl.getGISelRematGlobalCost();
}		}

		bool hasActiveVectorLength() const override {
		return Impl.hasActiveVectorLength();
		}

int getInstructionLatency(const Instruction *I) override {		int getInstructionLatency(const Instruction *I) override {
return Impl.getInstructionLatency(I);		return Impl.getInstructionLatency(I);
}		}
};		};

template <typename T>		template <typename T>
TargetTransformInfo::TargetTransformInfo(T Impl)		TargetTransformInfo::TargetTransformInfo(T Impl)
: TTIImpl(new Model<T>(Impl)) {}		: TTIImpl(new Model<T>(Impl)) {}
▲ Show 20 Lines • Show All 97 Lines • Show Last 20 Lines

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 612 Lines • ▼ Show 20 Lines	public:
bool shouldExpandReduction(const IntrinsicInst *II) const {		bool shouldExpandReduction(const IntrinsicInst *II) const {
return true;		return true;
}		}

unsigned getGISelRematGlobalCost() const {		unsigned getGISelRematGlobalCost() const {
return 1;		return 1;
}		}

		bool hasActiveVectorLength() const {
		return false;
		}

protected:		protected:
// Obtain the minimum required size to hold the value (without the sign)		// Obtain the minimum required size to hold the value (without the sign)
// In case of a vector it returns the min required size for one element.		// In case of a vector it returns the min required size for one element.
unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {		unsigned minRequiredElementSize(const Value* Val, bool &isSigned) {
if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {		if (isa<ConstantDataVector>(Val) \|\| isa<ConstantVector>(Val)) {
const auto* VectorValue = cast<Constant>(Val);		const auto* VectorValue = cast<Constant>(Val);

// In case of a vector need to pick the max between the min		// In case of a vector need to pick the max between the min
▲ Show 20 Lines • Show All 314 Lines • Show Last 20 Lines

llvm/include/llvm/IR/IntrinsicInst.h

Show First 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	static bool classof(const IntrinsicInst *I) {
return I->getIntrinsicID() == Intrinsic::dbg_label;		return I->getIntrinsicID() == Intrinsic::dbg_label;
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}
/// @}		/// @}
};		};

		/// This is the common base class for vector predication intrinsics.
		class VPIntrinsic : public IntrinsicInst {
		public:
		static Optional<int> GetMaskParamPos(Intrinsic::ID IntrinsicID);
		static Optional<int> GetVectorLengthParamPos(Intrinsic::ID IntrinsicID);

		/// The llvm.vp.* intrinsics for this instruction Opcode
		static Intrinsic::ID GetForOpcode(unsigned OC);

		// Whether \p ID is a VP intrinsic ID.
		static bool IsVPIntrinsic(Intrinsic::ID);

		/// \return the mask parameter or nullptr.
		Value *getMaskParam() const;

		/// \return the vector length parameter or nullptr.
		Value *getVectorLengthParam() const;

		/// \return whether the vector length param can be ignored.
		bool canIgnoreVectorLengthParam() const;

		/// \return the static element count (vector number of elements) the vector
		/// length parameter applies to.
		ElementCount getVectorLength() const;

		sdesmalenUnsubmitted Done Reply Inline Actions Can this be renamed to `getVectorLength()` and have it return `ElementCount` instead? (at which point we can drop the Optional) sdesmalen: Can this be renamed to `getVectorLength()` and have it return `ElementCount` instead? (at which…
		simollAuthorUnsubmitted Done Reply Inline Actions Sure. That actually makes it simpler. simoll: Sure. That actually makes it simpler.
		sdesmalenUnsubmitted Done Reply Inline Actions Awesome, thanks! sdesmalen: Awesome, thanks!
		// Methods for support type inquiry through isa, cast, and dyn_cast:
		static bool classof(const IntrinsicInst *I) {
		return IsVPIntrinsic(I->getIntrinsicID());
		}
		static bool classof(const Value *V) {
		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
		}

		// Equivalent non-predicated opcode
		unsigned getFunctionalOpcode() const {
		return GetFunctionalOpcodeForVP(getIntrinsicID());
		}

		// Equivalent non-predicated opcode
		static unsigned GetFunctionalOpcodeForVP(Intrinsic::ID ID);
		};

/// This is the common base class for constrained floating point intrinsics.		/// This is the common base class for constrained floating point intrinsics.
class ConstrainedFPIntrinsic : public IntrinsicInst {		class ConstrainedFPIntrinsic : public IntrinsicInst {
public:		public:
bool isUnaryOp() const;		bool isUnaryOp() const;
bool isTernaryOp() const;		bool isTernaryOp() const;
Optional<fp::RoundingMode> getRoundingMode() const;		Optional<fp::RoundingMode> getRoundingMode() const;
Optional<fp::ExceptionBehavior> getExceptionBehavior() const;		Optional<fp::ExceptionBehavior> getExceptionBehavior() const;

Show All 36 Lines	static bool classof(const IntrinsicInst *I) {
case Intrinsic::umul_with_overflow:		case Intrinsic::umul_with_overflow:
case Intrinsic::smul_with_overflow:		case Intrinsic::smul_with_overflow:
case Intrinsic::uadd_sat:		case Intrinsic::uadd_sat:
case Intrinsic::sadd_sat:		case Intrinsic::sadd_sat:
case Intrinsic::usub_sat:		case Intrinsic::usub_sat:
case Intrinsic::ssub_sat:		case Intrinsic::ssub_sat:
return true;		return true;
default:		default:
return false;		return false;
		jdoerfertUnsubmitted Done Reply Inline Actions I (started to) like generating these constructs. It will make additions later easier and less (copy&paste) error prone. If you put the mapping in a file with a macro #ifndef OPCODE2VPINTRINSIC #define OPCODE2VPINTRINSIC(Opcode, VPIntrinsic) #endif OPCODE2VPINTRINSIC(And, vp_and) ... #undef OPCODE2VPINTRINSIC you can replace all these long lists with codes like: #define OPCODE2VPINTRINSIC(Opcode, VPIntrinsic) case Instruction::Opcode: return Intrinsic::VPIntrinsic; #include ".../VPLanInstructions.def" jdoerfert: I (started to) like generating these constructs. It will make additions later easier and less…
		simollAuthorUnsubmitted Done Reply Inline Actions You are preaching to the choir here - i made it that way to stay closer to the other intrinsic classes. Update will have a macro file similar to `llvm/IR/Instruction.def`. simoll: You are preaching to the choir here - i made it that way to stay closer to the other intrinsic…
}		}
}		}
static bool classof(const Value *V) {		static bool classof(const Value *V) {
return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));		return isa<IntrinsicInst>(V) && classof(cast<IntrinsicInst>(V));
}		}

Value getLHS() const { return const_cast<Value>(getArgOperand(0)); }		Value getLHS() const { return const_cast<Value>(getArgOperand(0)); }
Value getRHS() const { return const_cast<Value>(getArgOperand(1)); }		Value getRHS() const { return const_cast<Value>(getArgOperand(1)); }
▲ Show 20 Lines • Show All 623 Lines • Show Last 20 Lines

llvm/include/llvm/IR/Intrinsics.td

	Show All 21 Lines
	// Intr*Mem - Memory properties. If no property is set, the worst case			// Intr*Mem - Memory properties. If no property is set, the worst case
	// is assumed (it may read and write any memory it can get access to and it may			// is assumed (it may read and write any memory it can get access to and it may
	// have other side effects).			// have other side effects).

	// IntrNoMem - The intrinsic does not access memory or have any other side			// IntrNoMem - The intrinsic does not access memory or have any other side
	// effects. It may be CSE'd deleted if dead, etc.			// effects. It may be CSE'd deleted if dead, etc.
	def IntrNoMem : IntrinsicProperty;			def IntrNoMem : IntrinsicProperty;

				// IntrNoSync - Threads executing the intrinsic will not synchronize using
				// memory or other means.
				def IntrNoSync : IntrinsicProperty;

	// IntrReadMem - This intrinsic only reads from memory. It does not write to			// IntrReadMem - This intrinsic only reads from memory. It does not write to
	// memory and has no other side effects. Therefore, it cannot be moved across			// memory and has no other side effects. Therefore, it cannot be moved across
	// potentially aliasing stores. However, it can be reordered otherwise and can			// potentially aliasing stores. However, it can be reordered otherwise and can
	// be deleted if dead.			// be deleted if dead.
	def IntrReadMem : IntrinsicProperty;			def IntrReadMem : IntrinsicProperty;

	// IntrWriteMem - This intrinsic only writes to memory, but does not read from			// IntrWriteMem - This intrinsic only writes to memory, but does not read from
	// memory, and has no other side effects. This means dead stores before calls			// memory, and has no other side effects. This means dead stores before calls
	▲ Show 20 Lines • Show All 1,110 Lines • ▼ Show 20 Lines

	// Intrinsic to detect whether its argument is a constant.			// Intrinsic to detect whether its argument is a constant.
	def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem, IntrWillReturn], "llvm.is.constant">;			def int_is_constant : Intrinsic<[llvm_i1_ty], [llvm_any_ty], [IntrNoMem, IntrWillReturn], "llvm.is.constant">;

	// Intrinsic to mask out bits of a pointer.			// Intrinsic to mask out bits of a pointer.
	def int_ptrmask: Intrinsic<[llvm_anyptr_ty], [llvm_anyptr_ty, llvm_anyint_ty],			def int_ptrmask: Intrinsic<[llvm_anyptr_ty], [llvm_anyptr_ty, llvm_anyint_ty],
	[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;			[IntrNoMem, IntrSpeculatable, IntrWillReturn]>;

				//===---------------- Vector Predication Intrinsics --------------===//

				// Binary operators
				let IntrProperties = [IntrNoMem, IntrNoSync, IntrWillReturn] in {
				def int_vp_add : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_sub : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_mul : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_sdiv : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_udiv : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_srem : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_urem : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_ashr : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_lshr : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_shl : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_or : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_and : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;
				def int_vp_xor : Intrinsic<[ llvm_anyvector_ty ],
				[ LLVMMatchType<0>,
				LLVMMatchType<0>,
				LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
				llvm_i32_ty]>;

				}
				jdoerfertUnsubmitted Done Reply Inline Actions I guess you can add readnone as an attribute, also nosync and nofree please. jdoerfert: I guess you can add readnone as an attribute, also nosync and nofree please.
				simollAuthorUnsubmitted Done Reply Inline Actions The intrinsics are `readnone` already. `nosync` seems sensible. simoll: The intrinsics are `readnone` already. `nosync` seems sensible.
				jdoerfertUnsubmitted Done Reply Inline Actions I asked @sstefan1 to add nosync and nofree to the td file and other places needed to use them. That should make your changes more concise. I would have argued we could even do the `#inline` trick here, or maybe some .td magic, to avoid this replication, but I will not force anything. jdoerfert: I asked @sstefan1 to add nosync and nofree to the td file and other places needed to use them.
				simollAuthorUnsubmitted Done Reply Inline Actions I would have argued we could even do the `#inline` trick here, or maybe some .td magic, to avoid this replication, but I will not force anything. I agree but refactoring `Intrinsics.td` is out of scope for this patch. simoll: > I would have argued we could even do the `#inline` trick here, or maybe some .td magic, to…
				sstefan1Unsubmitted Done Reply Inline Actions I kind of started some of this in D65377 and the approach will try and propose is an opt-out list. If I understood correctly I should try to do that part first (and propose). @jdoerfert correct me if I'm wrong. Hopefully I'll start this today. sstefan1: I kind of started some of this in D65377 and the approach will try and propose is an opt-out…
				jdoerfertUnsubmitted Done Reply Inline Actions @simoll Agreed. This is out of scope. @sstefan1 I think your plan sounds good. jdoerfert: @simoll Agreed. This is out of scope. @sstefan1 I think your plan sounds good.


	//===-------------------------- Masked Intrinsics -------------------------===//			//===-------------------------- Masked Intrinsics -------------------------===//
	//			//
	def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,			def int_masked_store : Intrinsic<[], [llvm_anyvector_ty,
	LLVMAnyPointerType<LLVMMatchType<0>>,			LLVMAnyPointerType<LLVMMatchType<0>>,
	llvm_i32_ty,			llvm_i32_ty,
	LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],			LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>],
	[IntrArgMemOnly, IntrWillReturn, ImmArg<2>]>;			[IntrArgMemOnly, IntrWillReturn, ImmArg<2>]>;

	▲ Show 20 Lines • Show All 234 Lines • Show Last 20 Lines

llvm/include/llvm/IR/VPIntrinsics.def

This file was added.

				//===-- IR/VPIntrinsics.def - Describes llvm.vp.* Intrinsics -- C++ --===//
				//
				jdoerfertUnsubmitted Done Reply Inline Actions Nit: copy&past jdoerfert: Nit: copy&past
				jdoerfertUnsubmitted Done Reply Inline Actions now too long and wrapped :( No need for `llvm/`, which is confusing as it lives in `llvm/IR`. Also "Instructions" might be a bit to broad? Idk, maybe: "vector intrinsic descriptions" jdoerfert: now too long and wrapped :( No need for `llvm/`, which is confusing as it lives in `llvm/IR`.
				simollAuthorUnsubmitted Done Reply Inline Actions //===-- IR/VPInstruction.def - Describes llvm.vp.* Intrinsics -- C++ --===// simoll: //===-- IR/VPInstruction.def - Describes llvm.vp.* Intrinsics -- C++ --===//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file contains descriptions of the various Vector Predication intrinsics.
				// This is used as a central place for enumerating the different instructions
				// and should eventually be the place to put comments about the instructions.
				//
				//===----------------------------------------------------------------------===//

				// NOTE: NO INCLUDE GUARD DESIRED!

				// Provide definitions of macros so that users of this file do not have to
				// define everything to use it...
				//
				#ifndef REGISTER_VP_INTRINSIC
				#define REGISTER_VP_INTRINSIC(VPID, MASKPOS, VLENPOS)
				#endif

				// Map this VP intrinsic to its functional Opcode
				#ifndef HANDLE_VP_TO_OC
				#define HANDLE_VP_TO_OC(VPID, OC)
				#endif

				jdoerfertUnsubmitted Done Reply Inline Actions To be honest, I would have made the Opcode for vp intrinsics of existing functions `VP_EXISTING` (or similar) and used a single macro definition here, assuming all vp intrinsics so far have a unique corresponding scalar version. These are two unrelated points you can consider (I won't force either). (If you device to do both changes, you can even get away by only defining the existing opcode in the intrinsic "macro definition"). jdoerfert: To be honest, I would have made the Opcode for vp intrinsics of existing functions…
				simollAuthorUnsubmitted Done Reply Inline Actions assuming all vp intrinsics so far have a unique corresponding scalar version. They don't. That's why it's separate. There will be more cases for constrained fp, reduction and memory ops. simoll: > assuming all vp intrinsics so far have a unique corresponding scalar version. They don't.
				jdoerfertUnsubmitted Done Reply Inline Actions I see. Though, even if later intriniscs do not correspond to scalar instructions, the changes would still be possible. I admit, at this point it is a question if you like: REGISTER_VP_INTRINSIC(vp_add, 2, 3) HANDLE_VP_TO_OC(vp_add, Add) REGISTER_VP_INTRINSIC(vp_red, 2, 3) better or worse than REGISTER_VP_FROM_SCALAR_INTRINSIC(Add, 2, 3) REGISTER_VP_INTRINSIC(Red, 2, 3) with the appropriate definition of the above: #define REGISTER_VP_FROM_SCALAR_INTRINSIC(OC, MP, VP) \ REGISTER_VP_INTRINSIC(OC, MP, VP) \ HANDLE_VP_TO_OC(vp_##OC, OC) jdoerfert: I see. Though, even if later intriniscs do not correspond to scalar instructions, the changes…
				simollAuthorUnsubmitted Done Reply Inline Actions I prefer the explicit form REGISTER_VP_INTRINSIC(vp_add, 2, 3) HANDLE_VP_TO_OC(vp_add, Add) The explicit form gives you much nicer diffs. You don't have to read a meta macro to understand what is going on. This file will undergo changes with the next patches. I wouldn't spend too much time compacting it before hasn't "settled down". simoll: I prefer the explicit form REGISTER_VP_INTRINSIC(vp_add, 2, 3) HANDLE_VP_TO_OC(vp_add…
				jdoerfertUnsubmitted Done Reply Inline Actions OK. jdoerfert: OK.
				///// Integer Arithmetic /////

				// llvm.vp.add(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_add, 2, 3)
				HANDLE_VP_TO_OC(vp_add, Add)

				// llvm.vp.and(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_and, 2, 3)
				HANDLE_VP_TO_OC(vp_and, And)

				// llvm.vp.ashr(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_ashr, 2, 3)
				HANDLE_VP_TO_OC(vp_ashr, AShr)

				// llvm.vp.lshr(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_lshr, 2, 3)
				HANDLE_VP_TO_OC(vp_lshr, LShr)

				// llvm.vp.mul(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_mul, 2, 3)
				HANDLE_VP_TO_OC(vp_mul, Mul)

				// llvm.vp.or(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_or, 2, 3)
				HANDLE_VP_TO_OC(vp_or, Or)

				// llvm.vp.sdiv(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_sdiv, 2, 3)
				HANDLE_VP_TO_OC(vp_sdiv, SDiv)

				// llvm.vp.shl(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_shl, 2, 3)
				HANDLE_VP_TO_OC(vp_shl, Shl)

				// llvm.vp.srem(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_srem, 2, 3)
				HANDLE_VP_TO_OC(vp_srem, SRem)

				// llvm.vp.sub(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_sub, 2, 3)
				HANDLE_VP_TO_OC(vp_sub, Sub)

				// llvm.vp.udiv(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_udiv, 2, 3)
				HANDLE_VP_TO_OC(vp_udiv, UDiv)

				// llvm.vp.urem(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_urem, 2, 3)
				HANDLE_VP_TO_OC(vp_urem, URem)

				// llvm.vp.xor(x,y,mask,vlen)
				REGISTER_VP_INTRINSIC(vp_xor, 2, 3)
				HANDLE_VP_TO_OC(vp_xor, Xor)

				#undef REGISTER_VP_INTRINSIC
				#undef HANDLE_VP_TO_OC
				jdoerfertUnsubmitted Done Reply Inline Actions Nit: I'd run clang format on these files as well. Should minimize diffs in the long run. jdoerfert: Nit: I'd run clang format on these files as well. Should minimize diffs in the long run.

llvm/lib/IR/IntrinsicInst.cpp

Show First 20 Lines • Show All 172 Lines • ▼ Show 20 Lines	#define INSTRUCTION(NAME, NARGS, ROUND_MODE, INTRINSIC) \
case Intrinsic::INTRINSIC:		case Intrinsic::INTRINSIC:
#include "llvm/IR/ConstrainedOps.def"		#include "llvm/IR/ConstrainedOps.def"
return true;		return true;
default:		default:
return false;		return false;
}		}
}		}

		ElementCount VPIntrinsic::getVectorLength() const {
		auto GetVectorLengthOfType = [](const Type *T) -> ElementCount {
		SjoerdMeijerUnsubmitted Done Reply Inline Actions Nit: don't think this lamba adds anything, it's just called once. SjoerdMeijer: Nit: don't think this lamba adds anything, it's just called once.
		simollAuthorUnsubmitted Done Reply Inline Actions It'll be called twice in the final version (see reference patch). simoll: It'll be called twice in the final version (see [reference patch](https://reviews.llvm.
		auto VT = cast<VectorType>(T);
		auto ElemCount = VT->getElementCount();
		return ElemCount;
		};

		auto VPMask = getMaskParam();
		return GetVectorLengthOfType(VPMask->getType());
		}

		Value *VPIntrinsic::getMaskParam() const {
		auto maskPos = GetMaskParamPos(getIntrinsicID());
		if (maskPos)
		return getArgOperand(maskPos.getValue());
		return nullptr;
		}

		Value *VPIntrinsic::getVectorLengthParam() const {
		jdoerfertUnsubmitted Done Reply Inline Actions Does it make sense to have this corner case in here or should we just make it an `Optional<int64_t>` and call it a day? You can also remove the lambda if you move the code, except there is a reason to keep it. jdoerfert: Does it make sense to have this corner case in here or should we just make it an…
		simollAuthorUnsubmitted Done Reply Inline Actions Actually, the excessive vector length case should `return None` to imply that the static vector length could not be inferred. Btw, later patches will reuse the lambda. simoll: Actually, the excessive vector length case should `return None` to imply that the static vector…
		jdoerfertUnsubmitted Done Reply Inline Actions Fine with me. jdoerfert: Fine with me.
		auto vlenPos = GetVectorLengthParamPos(getIntrinsicID());
		if (vlenPos)
		return getArgOperand(vlenPos.getValue());
		return nullptr;
		}

		Optional<int> VPIntrinsic::GetMaskParamPos(Intrinsic::ID IntrinsicID) {
		switch (IntrinsicID) {
		default:
		return None;

		#define REGISTER_VP_INTRINSIC(VPID, MASKPOS, VLENPOS) \
		case Intrinsic::VPID: \
		return MASKPOS;
		#include "llvm/IR/VPIntrinsics.def"
		}
		}
		samparkerUnsubmitted Done Reply Inline Actions Are you flexible on this..? For Arm's MVE, it would be useful for us to be able to use a vector length even though we have a fixed vector width. We are trying to use the vectorizers tail folding ability and it could be really nice to use the vector length instead of performing a vector icmp for active lanes. This could free up the 'mask' for other conditions within the loop without having to compare conditions (which will be expensive for us). samparker: Are you flexible on this..? For Arm's MVE, it would be useful for us to be able to use a vector…
		simollAuthorUnsubmitted Done Reply Inline Actions Flexible in what sense? The vector length param should work with scalable types already. To clarify: this returns whether the vector length value for this instance of the intrinsic does not have a masking effect on the lanes. This happens when the vector length parameter is a constant that is larger than the length of the vector type (or if the MSB is set). I'll hoist the test in l225 to l240 up so you can "turn off" the vector length param also for SVE/MVE ( by passing `i32 -1` as the vlen). simoll: Flexible in what sense? The vector length param should work with scalable types already. To…
		samparkerUnsubmitted Done Reply Inline Actions Ah, sorry, I had missed this was for constant lengths! samparker: Ah, sorry, I had missed this was for constant lengths!

		Optional<int> VPIntrinsic::GetVectorLengthParamPos(Intrinsic::ID IntrinsicID) {
		jdoerfertUnsubmitted Done Reply Inline Actions Default clauses make it harder to find all switches that might need to be adjusted. If there is no strong reason to have them we should not. jdoerfert: Default clauses make it harder to find all switches that might need to be adjusted. If there is…
		jdoerfertUnsubmitted Done Reply Inline Actions Forget this one. jdoerfert: Forget this one.
		switch (IntrinsicID) {
		default:
		return None;

		#define REGISTER_VP_INTRINSIC(VPID, MASKPOS, VLENPOS) \
		case Intrinsic::VPID: \
		return VLENPOS;
		#include "llvm/IR/VPIntrinsics.def"
		}
		}

		bool VPIntrinsic::IsVPIntrinsic(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return false;

		#define REGISTER_VP_INTRINSIC(VPID, MASKPOS, VLENPOS) \
		case Intrinsic::VPID: \
		break;
		#include "llvm/IR/VPIntrinsics.def"
		}
		return true;
		}

		// Equivalent non-predicated opcode
		unsigned VPIntrinsic::GetFunctionalOpcodeForVP(Intrinsic::ID ID) {
		switch (ID) {
		default:
		return Instruction::Call;

		jdoerfertUnsubmitted Done Reply Inline Actions Leftover? jdoerfert: Leftover?
		#define HANDLE_VP_TO_OC(VPID, OC) \
		case Intrinsic::VPID: \
		return Instruction::OC;
		#include "llvm/IR/VPIntrinsics.def"
		}
		}

		Intrinsic::ID VPIntrinsic::GetForOpcode(unsigned OC) {
		switch (OC) {
		default:
		return Intrinsic::not_intrinsic;

		#define HANDLE_VP_TO_OC(VPID, OC) \
		jdoerfertUnsubmitted Done Reply Inline Actions I doubt this is a sensible default. I doubt one should return anything for a non-vp intrinsic. jdoerfert: I doubt this is a sensible default. I doubt one should return anything for a non-vp intrinsic.
		simollAuthorUnsubmitted Done Reply Inline Actions If there is no functional IR opcode this is still a call to an intrinsic function. It's natural to return `Call` in this case. simoll: If there is no functional IR opcode this is still a call to an intrinsic function. It's natural…
		jdoerfertUnsubmitted Done Reply Inline Actions not really convinced but OK. jdoerfert: not really convinced but OK.
		case Instruction::OC: \
		return Intrinsic::VPID;
		#include "llvm/IR/VPIntrinsics.def"
		}
		}

		bool VPIntrinsic::canIgnoreVectorLengthParam() const {
		// No vlen param - no lanes masked-off by it.
		auto *VLParam = getVectorLengthParam();
		if (!VLParam)
		return true;

		// Can ignore if MSB of vlen is set.
		auto VLConst = dyn_cast<ConstantInt>(VLParam);
		if (VLConst && VLConst->getSExtValue() == -1)
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions Why are you doing this and not just "less than zero"? andrew.w.kaylor: Why are you doing this and not just "less than zero"?
		simollAuthorUnsubmitted Done Reply Inline Actions Because with the new LangRef wording %evl is unsigned and the `i32 -1` constant is handled as a special case. simoll: Because with the new LangRef wording %evl is unsigned and the `i32 -1` constant is handled as a…
		andrew.w.kaylorUnsubmitted Not Done Reply Inline Actions I see that you also discussed this with Eli. I don't like the fact that constant -1 and a value that is 0xFFFFFFFF at runtime could have different behaviors. There's no telling when the constant folder might find a path to convert a value to a constant. I understand that the intrsincs aren't supposed to be called with values outside the range of [0, W] and has undefined behavior if it is, but having an exception for a constant does seem right. As a completely contrived example, suppose I had IR like this: bb1: br i1 %flag, label %bb2, label %bb3 bb2: %t1 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) br label %bb4 bb3: %t2 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 -1) br label %bb4 bb4: %t3 = phi <8 x i32> [%t1, %bb1], [%t2, %bb2] ... Now some pass is going to transform that into: bb1: %evl = select i1 %flag, i32 %n, i32 -1 %t3 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %evl) ... And now the -1 case has undefined behavior. andrew.w.kaylor: I see that you also discussed this with Eli. I don't like the fact that constant -1 and a value…
		simollAuthorUnsubmitted Done Reply Inline Actions That's a problem, yes. We could do away with the `-1` special case entirely. Then, to disable evl it has to be set to the number of vector elements. For a <W x ty> vector that's just the constant `W`, for scalable vector types that would be the appropriate `vscale` const exprssion. simoll: That's a problem, yes. We could do away with the `-1` special case entirely. Then, to disable…
		return true;

		// Vlen param greater-equal type vlen - no lanes masked-off.
		if (VLConst) {
		auto ElemCount = getVectorLength();
		if (ElemCount.Scalable)
		return false;

		uint64_t VLNum = VLConst->getZExtValue();
		if (VLNum >= ElemCount.Min)
		return true;
		}

		// Cannot ignore vlen param by default.
		return false;
		}
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: braces. jdoerfert: Nit: braces.
		andrew.w.kaylorUnsubmitted Done Reply Inline Actions You can get here in one step with 'this->getModule()' andrew.w.kaylor: You can get here in one step with 'this->getModule()'

Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {		Instruction::BinaryOps BinaryOpIntrinsic::getBinaryOp() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
case Intrinsic::uadd_with_overflow:		case Intrinsic::uadd_with_overflow:
case Intrinsic::sadd_with_overflow:		case Intrinsic::sadd_with_overflow:
case Intrinsic::uadd_sat:		case Intrinsic::uadd_sat:
case Intrinsic::sadd_sat:		case Intrinsic::sadd_sat:
return Instruction::Add;		return Instruction::Add;
case Intrinsic::usub_with_overflow:		case Intrinsic::usub_with_overflow:
case Intrinsic::ssub_with_overflow:		case Intrinsic::ssub_with_overflow:
case Intrinsic::usub_sat:		case Intrinsic::usub_sat:
case Intrinsic::ssub_sat:		case Intrinsic::ssub_sat:
return Instruction::Sub;		return Instruction::Sub;
case Intrinsic::umul_with_overflow:		case Intrinsic::umul_with_overflow:
case Intrinsic::smul_with_overflow:		case Intrinsic::smul_with_overflow:
return Instruction::Mul;		return Instruction::Mul;
		jdoerfertUnsubmitted Done Reply Inline Actions Nit: clang format your changes (or patches) (clang-format.py for vim has the formatdiff option) jdoerfert: Nit: clang format your changes (or patches) (clang-format.py for vim has the formatdiff option)
		simollAuthorUnsubmitted Done Reply Inline Actions This particular function is clang-formatted already.. The existing code in `IntrinsicInst.cpp` is not. I'll do a patch clang-formatting the entire file after this one. simoll: This particular function is clang-formatted already.. The existing code in `IntrinsicInst.cpp`…
default:		default:
llvm_unreachable("Invalid intrinsic");		llvm_unreachable("Invalid intrinsic");
}		}
}		}

bool BinaryOpIntrinsic::isSigned() const {		bool BinaryOpIntrinsic::isSigned() const {
switch (getIntrinsicID()) {		switch (getIntrinsicID()) {
case Intrinsic::sadd_with_overflow:		case Intrinsic::sadd_with_overflow:
Show All 16 Lines

llvm/test/Verifier/vp-intrinsics.ll

This file was added.

				; RUN: opt --verify %s

				define void @test_vp_int(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n) {
				%r0 = call <8 x i32> @llvm.vp.add.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r1 = call <8 x i32> @llvm.vp.sub.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r2 = call <8 x i32> @llvm.vp.mul.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r3 = call <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r4 = call <8 x i32> @llvm.vp.srem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r5 = call <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r6 = call <8 x i32> @llvm.vp.urem.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r7 = call <8 x i32> @llvm.vp.and.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r8 = call <8 x i32> @llvm.vp.or.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%r9 = call <8 x i32> @llvm.vp.xor.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rA = call <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rB = call <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				%rC = call <8 x i32> @llvm.vp.shl.v8i32(<8 x i32> %i0, <8 x i32> %i1, <8 x i1> %m, i32 %n)
				ret void
				}

				; integer arith
				declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				; bit arith
				declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)
				declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32)

llvm/unittests/IR/CMakeLists.txt

Show All 34 Lines	add_llvm_unittest(IRTests
TypesTest.cpp		TypesTest.cpp
UseTest.cpp		UseTest.cpp
UserTest.cpp		UserTest.cpp
ValueHandleTest.cpp		ValueHandleTest.cpp
ValueMapTest.cpp		ValueMapTest.cpp
ValueTest.cpp		ValueTest.cpp
VectorTypesTest.cpp		VectorTypesTest.cpp
VerifierTest.cpp		VerifierTest.cpp
		VPIntrinsicTest.cpp
WaymarkTest.cpp		WaymarkTest.cpp
)		)

target_link_libraries(IRTests PRIVATE LLVMTestingSupport)		target_link_libraries(IRTests PRIVATE LLVMTestingSupport)

llvm/unittests/IR/VPIntrinsicTest.cpp

This file was added.

				//===- VPIntrinsicTest.cpp - VPIntrinsic unit tests ---------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/SmallVector.h"
				#include "llvm/AsmParser/Parser.h"
				#include "llvm/IR/Constants.h"
				#include "llvm/IR/IRBuilder.h"
				#include "llvm/IR/IntrinsicInst.h"
				#include "llvm/IR/LLVMContext.h"
				#include "llvm/IR/Module.h"
				#include "llvm/IR/Verifier.h"
				#include "llvm/Support/SourceMgr.h"
				#include "gtest/gtest.h"

				using namespace llvm;

				jdoerfertUnsubmitted Done Reply Inline Actions Not: `using namsepace llvm;` is probably less confusing here jdoerfert: Not: `using namsepace llvm;` is probably less confusing here
				namespace {

				class VPIntrinsicTest : public testing::Test {
				simollAuthorUnsubmitted Done Reply Inline Actions Noted. Will be fixed. simoll: Noted. Will be fixed.
				protected:
				LLVMContext Context;

				VPIntrinsicTest() : Context() {}

				LLVMContext C;
				SMDiagnostic Err;

				std::unique_ptr<Module> CreateVPDeclarationModule() {
				return parseAssemblyString(
				" declare <8 x i32> @llvm.vp.add.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.sub.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.mul.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.sdiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.srem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.udiv.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.urem.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.and.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.xor.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.or.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.ashr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.lshr.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) "
				" declare <8 x i32> @llvm.vp.shl.v8i32(<8 x i32>, <8 x i32>, <8 x i1>, i32) ",
				Err, C);
				}
				};

				/// Check that VPIntrinsic:canIgnoreVectorLengthParam() returns true
				/// if the vector length parameter does not mask-off any lanes.
				TEST_F(VPIntrinsicTest, CanIgnoreVectorLength) {
				LLVMContext C;
				SMDiagnostic Err;

				std::unique_ptr<Module> M =
				parseAssemblyString(
				"declare <256 x i64> @llvm.vp.mul.v256i64(<256 x i64>, <256 x i64>, <256 x i1>, i32)"
				"declare <vscale x 2 x i64> @llvm.vp.mul.nxv2i64(<vscale x 2 x i64>, <vscale x 2 x i64>, <vscale x 2 x i1>, i32)"
				"define void @test_static_vlen( "
				" <256 x i64> %i0, <vscale x 2 x i64> %si0,"
				" <256 x i64> %i1, <vscale x 2 x i64> %si1,"
				" <256 x i1> %m, <vscale x 2 x i1> %sm, i32 %vl) { "
				" %r0 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 %vl)"
				" %r1 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 256)"
				" %r2 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 0)"
				" %r3 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 -1)"
				" %r4 = call <256 x i64> @llvm.vp.mul.v256i64(<256 x i64> %i0, <256 x i64> %i1, <256 x i1> %m, i32 123)"
				" %r5 = call <vscale x 2 x i64> @llvm.vp.mul.nxv2i64(<vscale x 2 x i64> %si0, <vscale x 2 x i64> %si1, <vscale x 2 x i1> %sm, i32 -1)"
				" %r6 = call <vscale x 2 x i64> @llvm.vp.mul.nxv2i64(<vscale x 2 x i64> %si0, <vscale x 2 x i64> %si1, <vscale x 2 x i1> %sm, i32 99999)"
				" ret void "
				"}",
				Err, C);

				auto *F = M->getFunction("test_static_vlen");
				assert(F);

				const int NumExpected = 7;
				const bool Expected[] = {false, true, false, true, false, true, false};
				int i = 0;
				for (auto &I : F->getEntryBlock()) {
				VPIntrinsic *VPI = dyn_cast<VPIntrinsic>(&I);
				if (!VPI) {
				ASSERT_TRUE(I.isTerminator());
				continue;
				}

				ASSERT_LT(i, NumExpected);
				ASSERT_EQ(Expected[i], VPI->canIgnoreVectorLengthParam());
				++i;
				}
				}

				/// Check that the argument returned by
				/// VPIntrinsic::Get<X>ParamPos(Intrinsic::ID) has the expected type.
				TEST_F(VPIntrinsicTest, GetParamPos) {
				std::unique_ptr<Module> M = CreateVPDeclarationModule();
				assert(M);

				for (Function &F : *M) {
				ASSERT_TRUE(F.isIntrinsic());
				Optional<int> MaskParamPos =
				VPIntrinsic::GetMaskParamPos(F.getIntrinsicID());
				if (MaskParamPos.hasValue()) {
				Type *MaskParamType = F.getArg(MaskParamPos.getValue())->getType();
				ASSERT_TRUE(MaskParamType->isVectorTy());
				ASSERT_TRUE(MaskParamType->getVectorElementType()->isIntegerTy(1));
				}

				Optional<int> VecLenParamPos =
				VPIntrinsic::GetVectorLengthParamPos(F.getIntrinsicID());
				if (VecLenParamPos.hasValue()) {
				Type *VecLenParamType = F.getArg(VecLenParamPos.getValue())->getType();
				ASSERT_TRUE(VecLenParamType->isIntegerTy(32));
				}
				}
				}

				/// Check that going from Opcode to VP intrinsic and back results in the same
				/// Opcode.
				TEST_F(VPIntrinsicTest, OpcodeRoundTrip) {
				std::vector<unsigned> Opcodes;
				Opcodes.reserve(100);

				{
				#define HANDLE_INST(OCNum, OCName, Class) Opcodes.push_back(OCNum);
				#include "llvm/IR/Instruction.def"
				}

				unsigned FullTripCounts = 0;
				for (unsigned OC : Opcodes) {
				Intrinsic::ID VPID = VPIntrinsic::GetForOpcode(OC);
				// no equivalent VP intrinsic available
				if (VPID == Intrinsic::not_intrinsic)
				continue;

				unsigned RoundTripOC = VPIntrinsic::GetFunctionalOpcodeForVP(VPID);
				// no equivalent Opcode available
				if (RoundTripOC == Instruction::Call)
				continue;

				ASSERT_EQ(RoundTripOC, OC);
				++FullTripCounts;
				}
				ASSERT_NE(FullTripCounts, 0u);
				}

				} // end anonymous namespace

llvm/utils/TableGen/CodeGenIntrinsics.h

Show First 20 Lines • Show All 117 Lines • ▼ Show 20 Lines	struct CodeGenIntrinsic {
bool canThrow;		bool canThrow;

/// True if the intrinsic is marked as noduplicate.		/// True if the intrinsic is marked as noduplicate.
bool isNoDuplicate;		bool isNoDuplicate;

/// True if the intrinsic is no-return.		/// True if the intrinsic is no-return.
bool isNoReturn;		bool isNoReturn;

		/// True if the intrinsic is no-sync.
		bool isNoSync;

/// True if the intrinsic is will-return.		/// True if the intrinsic is will-return.
bool isWillReturn;		bool isWillReturn;

/// True if the intrinsic is cold.		/// True if the intrinsic is cold.
bool isCold;		bool isCold;

/// True if the intrinsic is marked as convergent.		/// True if the intrinsic is marked as convergent.
bool isConvergent;		bool isConvergent;
▲ Show 20 Lines • Show All 60 Lines • Show Last 20 Lines

llvm/utils/TableGen/CodeGenTarget.cpp

Show First 20 Lines • Show All 601 Lines • ▼ Show 20 Lines	CodeGenIntrinsic::CodeGenIntrinsic(Record *R) {
std::string DefName = std::string(R->getName());		std::string DefName = std::string(R->getName());
ArrayRef<SMLoc> DefLoc = R->getLoc();		ArrayRef<SMLoc> DefLoc = R->getLoc();
ModRef = ReadWriteMem;		ModRef = ReadWriteMem;
Properties = 0;		Properties = 0;
isOverloaded = false;		isOverloaded = false;
isCommutative = false;		isCommutative = false;
canThrow = false;		canThrow = false;
isNoReturn = false;		isNoReturn = false;
		isNoSync = false;
isWillReturn = false;		isWillReturn = false;
isCold = false;		isCold = false;
isNoDuplicate = false;		isNoDuplicate = false;
isConvergent = false;		isConvergent = false;
isSpeculatable = false;		isSpeculatable = false;
hasSideEffects = false;		hasSideEffects = false;

if (DefName.size() <= 4 \|\|		if (DefName.size() <= 4 \|\|
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	if (TyEl->isSubClassOf("LLVMMatchType")) {
PrintFatalError(DefLoc,		PrintFatalError(DefLoc,
Twine("ParamTypes is ") + TypeList->getAsString());		Twine("ParamTypes is ") + TypeList->getAsString());
}		}
VT = OverloadedVTs[MatchTy];		VT = OverloadedVTs[MatchTy];
// It only makes sense to use the extended and truncated vector element		// It only makes sense to use the extended and truncated vector element
// variants with iAny types; otherwise, if the intrinsic is not		// variants with iAny types; otherwise, if the intrinsic is not
// overloaded, all the types can be specified directly.		// overloaded, all the types can be specified directly.
assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&		assert(((!TyEl->isSubClassOf("LLVMExtendedType") &&
!TyEl->isSubClassOf("LLVMTruncatedType") &&		!TyEl->isSubClassOf("LLVMTruncatedType")) \|\|
!TyEl->isSubClassOf("LLVMScalarOrSameVectorWidth")) \|\|
VT == MVT::iAny \|\| VT == MVT::vAny) &&		VT == MVT::iAny \|\| VT == MVT::vAny) &&
"Expected iAny or vAny type");		"Expected iAny or vAny type");
} else		} else
VT = getValueType(TyEl->getValueAsDef("VT"));		VT = getValueType(TyEl->getValueAsDef("VT"));

// Reject invalid types.		// Reject invalid types.
if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)		if (VT == MVT::isVoid && i != e-1 /void at end means varargs/)
PrintFatalError(DefLoc, "Intrinsic '" + DefName +		PrintFatalError(DefLoc, "Intrinsic '" + DefName +
Show All 28 Lines	for (unsigned i = 0, e = PropList->size(); i != e; ++i) {
else if (Property->getName() == "Throws")		else if (Property->getName() == "Throws")
canThrow = true;		canThrow = true;
else if (Property->getName() == "IntrNoDuplicate")		else if (Property->getName() == "IntrNoDuplicate")
isNoDuplicate = true;		isNoDuplicate = true;
else if (Property->getName() == "IntrConvergent")		else if (Property->getName() == "IntrConvergent")
isConvergent = true;		isConvergent = true;
else if (Property->getName() == "IntrNoReturn")		else if (Property->getName() == "IntrNoReturn")
isNoReturn = true;		isNoReturn = true;
		else if (Property->getName() == "IntrNoSync")
		isNoSync = true;
else if (Property->getName() == "IntrWillReturn")		else if (Property->getName() == "IntrWillReturn")
isWillReturn = true;		isWillReturn = true;
else if (Property->getName() == "IntrCold")		else if (Property->getName() == "IntrCold")
isCold = true;		isCold = true;
else if (Property->getName() == "IntrSpeculatable")		else if (Property->getName() == "IntrSpeculatable")
isSpeculatable = true;		isSpeculatable = true;
else if (Property->getName() == "IntrHasSideEffects")		else if (Property->getName() == "IntrHasSideEffects")
hasSideEffects = true;		hasSideEffects = true;
▲ Show 20 Lines • Show All 44 Lines • Show Last 20 Lines

llvm/utils/TableGen/IntrinsicEmitter.cpp

Show First 20 Lines • Show All 573 Lines • ▼ Show 20 Lines	if (L->canThrow != R->canThrow)
return R->canThrow;		return R->canThrow;

if (L->isNoDuplicate != R->isNoDuplicate)		if (L->isNoDuplicate != R->isNoDuplicate)
return R->isNoDuplicate;		return R->isNoDuplicate;

if (L->isNoReturn != R->isNoReturn)		if (L->isNoReturn != R->isNoReturn)
return R->isNoReturn;		return R->isNoReturn;

		if (L->isNoSync != R->isNoSync)
		return R->isNoSync;

if (L->isWillReturn != R->isWillReturn)		if (L->isWillReturn != R->isWillReturn)
return R->isWillReturn;		return R->isWillReturn;

if (L->isCold != R->isCold)		if (L->isCold != R->isCold)
return R->isCold;		return R->isCold;

if (L->isConvergent != R->isConvergent)		if (L->isConvergent != R->isConvergent)
return R->isConvergent;		return R->isConvergent;
▲ Show 20 Lines • Show All 125 Lines • ▼ Show 20 Lines	if (ae) {
OS << "};\n";		OS << "};\n";
OS << " AS[" << numAttrs++ << "] = AttributeList::get(C, "		OS << " AS[" << numAttrs++ << "] = AttributeList::get(C, "
<< attrIdx << ", AttrParam" << attrIdx << ");\n";		<< attrIdx << ", AttrParam" << attrIdx << ");\n";
}		}
}		}

if (!intrinsic.canThrow \|\|		if (!intrinsic.canThrow \|\|
(intrinsic.ModRef != CodeGenIntrinsic::ReadWriteMem && !intrinsic.hasSideEffects) \|\|		(intrinsic.ModRef != CodeGenIntrinsic::ReadWriteMem && !intrinsic.hasSideEffects) \|\|
intrinsic.isNoReturn \|\| intrinsic.isWillReturn \|\| intrinsic.isCold \|\|		intrinsic.isNoReturn \|\| intrinsic.isNoSync \|\| intrinsic.isWillReturn \|\|
intrinsic.isNoDuplicate \|\| intrinsic.isConvergent \|\|		intrinsic.isCold \|\| intrinsic.isNoDuplicate \|\| intrinsic.isConvergent \|\|
intrinsic.isSpeculatable) {		intrinsic.isSpeculatable) {
OS << " const Attribute::AttrKind Atts[] = {";		OS << " const Attribute::AttrKind Atts[] = {";
bool addComma = false;		bool addComma = false;
if (!intrinsic.canThrow) {		if (!intrinsic.canThrow) {
OS << "Attribute::NoUnwind";		OS << "Attribute::NoUnwind";
addComma = true;		addComma = true;
}		}
if (intrinsic.isNoReturn) {		if (intrinsic.isNoReturn) {
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::NoReturn";		OS << "Attribute::NoReturn";
addComma = true;		addComma = true;
}		}
		if (intrinsic.isNoSync) {
		if (addComma)
		OS << ",";
		OS << "Attribute::NoSync";
		addComma = true;
		}
if (intrinsic.isWillReturn) {		if (intrinsic.isWillReturn) {
if (addComma)		if (addComma)
OS << ",";		OS << ",";
OS << "Attribute::WillReturn";		OS << "Attribute::WillReturn";
addComma = true;		addComma = true;
}		}
if (intrinsic.isCold) {		if (intrinsic.isCold) {
if (addComma)		if (addComma)
▲ Show 20 Lines • Show All 210 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[VP,Integer,#1] Vector-predicated integer intrinsicsClosedPublic

Details

Diff Detail

Event Timeline

Changes

Changes

Revision Contents

Diff 246352

llvm/docs/LangRef.rst

llvm/include/llvm/Analysis/TargetTransformInfo.h

llvm/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/include/llvm/IR/IntrinsicInst.h

llvm/include/llvm/IR/Intrinsics.td

llvm/include/llvm/IR/VPIntrinsics.def

llvm/lib/IR/IntrinsicInst.cpp

llvm/test/Verifier/vp-intrinsics.ll

llvm/unittests/IR/CMakeLists.txt

llvm/unittests/IR/VPIntrinsicTest.cpp

llvm/utils/TableGen/CodeGenIntrinsics.h

llvm/utils/TableGen/CodeGenTarget.cpp

llvm/utils/TableGen/IntrinsicEmitter.cpp

[VP,Integer,#1] Vector-predicated integer intrinsics
ClosedPublic