This patch adds IR intrinsics for vector-predicated integer arithmetic.
It is subpatch #1 of the integer slice of LLVM-VP.
LLVM-VP is a larger effort to bring native vector predication to LLVM.
Differential D69891
[VP,Integer,#1] Vector-predicated integer intrinsics simoll on Nov 6 2019, 6:19 AM. Authored by
Details This patch adds IR intrinsics for vector-predicated integer arithmetic. It is subpatch #1 of the integer slice of LLVM-VP.
Diff Detail
Event TimelineThere are a very large number of changes, so older changes are hidden. Show Older Changes
Comment Actions Thanks for the comments so far :)
Comment Actions Changes
Comment Actions I think I'm fine with this. Anyone else?
Comment Actions +1 from me! Thanks for doing this!
Comment Actions Thanks for reviewing! I'll upload a final update for the LangRef.
Comment Actions Thanks for your efforts to move this forward @simoll!
Comment Actions
Comment Actions
Comment Actions
Comment Actions Just wanted to check your plans because the status is "planned changes to this revision". Comment Actions
Comment Actions Unit tests: fail. 62371 tests passed, 3 failed and 839 were skipped. failed: LLVM.CodeGen/AMDGPU/llvm.amdgcn.s.buffer.load.ll failed: LLVM.CodeGen/AMDGPU/smrd.ll failed: libc++.std/thread/thread_mutex/thread_mutex_requirements/thread_mutex_requirements_mutex/thread_mutex_class/try_lock.pass.cpp clang-tidy: fail. clang-tidy found 0 errors and 18 warnings. 0 of them are added as review comments below (why?). clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch. Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project. Comment Actions I know it's very late for me to be bringing this up, but I have a couple of broad questions -- one minor, one less minor. First, when we introduced the constrained FP intrinsics Chandler suggested inserting ".experimental" in the names because of the likelihood that something would need to change before we were done and the "experimental" modifier would make the auto-upgrader handling slightly less ugly. That same reasoning seems like it would apply here. Second, I'm not 100% comfortable with how the explicit vector length argument is represented. Yes, I know, I should have brought this up about a year ago. I don't have any objections to the existence of this argument. I understand that we need some way to handle it. I'm just wondering if we can handle it in some way that makes it more like an optional argument. I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right? My concern is that for targets that don't do SVE kinds of things we still have this entirely redundant, if not meaningless, operand to carry around. What would you think of this being provided as an operand bundle when it is needed? Comment Actions
This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases. Comment Actions You're mixing up different meanings of "vector length" here:
Although the EVL argument is not architecture specific in that it has the same semantics on all targets and can be legalized on all targets, Andrew is right that in practice, one would not generate vector code using the EVL argument to perform stripmining without knowing that the target architecture has efficient hardware support for it. Instead, that argument would be fixed to a constant -1 if the code does not want/need this concept. This is not too different from many other vector operations (e.g., shufflevector can express arbitrary shuffles, but many architectures support only a subset of those efficiently, and vectorization cost models take that into account). The real question is, do targets without native support for EVL have anything to gain from making the argument optional? I don't think so:
So I think that making this argument optional accomplishes nothing, it only makes support for targets that do care (RISC-V V, SX-Aurora VE) more complicated. Comment Actions I don't mind changing the name to 'llvm.experimental.vp.add', etc . The mid to long-term goal is first-class VP instructions in any case.
Operand bundles are not the same as optional parameters. Operand bundles pessimistically imply side-effects where as a plain i32 argument is innocuous to optimizations:
That is reasonable for constrained fp intrinsics, which always have such a dependence. Many VP intrinsics however do not have any side effects or only inspect/modify memory through their pointer arguments, etc. Yep. Thanks for clarifying this. Once LLVM supports optional parameters without adverse effects, we can re-visit this discussion until then i'd strongly prefer the EVL to be an explicit parameter. Comment Actions Alright, I was reaching for a more general concept that I obviously don't know the term for. What I meant to say was just that whatever generated the argument must know something specific about the target architecture. We could add a callsite attribute when the intrinsic is created. We have that information about the intrinsic, right? I'll admit that there may be no practical difference between what you've proposed and what I have in mind (in the abstract sense that assumes it's even possible to implement this in the way that I'm imagining). Mostly, I'm trying to explore what I perceive to be a bad smell in the IR. Having a parameter that is irrelevant for some targets just doesn't seem right. Introducing these intrinsics with "experimental" in the name would make me feel a bit better about that, even though again it has no practical effect. I'm not sure I agree that is the real question. What is gained by not having the floating point constraint arguments always present even when they aren't needed? We have values that describe the default mode, so we could use those when we want the default mode. I think we all agree that they shouldn't be there when we don't need them, yet I would argue that those arguments are more relevant when "not needed" than the evl argument is. I'm imagining walking someone through the IR, explaining what each instruction means. I'd be a bit embarrassed when I get to a vp intrinsic and say, "Ignore the last argument. We don't use it for this target." But like I said above, this is much more a vague indication to me that we haven't got this right yet than a concrete problem.
Comment Actions To me all of this points in the direction that we'd want a standard mechanism for optional parameters. We have three options for this right now: ; all optional parameters set (eg RISC-V V or SX-Aurora) %x= llvm.fp.add(%a, %b) mask(%mask), evl(%evl), fpenv(fpround.tonearest, fpexcept.strict) ; constrained fp SIMD intrinsic (eg AVX512) %y= llvm.fp.add(%a, %b) mask(%mask), fpenv(fpround.tonearest, fpexcept.strict) ; unpredicated, default fp env fp add (isomorphic to the fadd instruction) %z= llvm.fp.add(%a, %b) This is the solution that i'd want to see in LLVM to solve the optional parameter problem once and for all for constrained fp *and* VP (we should look out for other potential applications and start an RFC). b) OpBundles (imply side-effects unless overriden with callsite attributes) - clunky, could form the starting point for a) though. declare opaquety @llvm.vp.mask(<8 x i1>) declare opaquety @llvm.vp.evlmask(<8 x i1>, i32 %evl) declare <8 x float> @llvm.vp.fadd(%a, %b, opaquety mask) For constrained fp, this could look as follows declare opaquety @llvm.constrainedfp.fpenv(metadata, metadata) declare opaquety @llvm.constrained.fadd(%a, %b, opaquety fpenv) Note that we could allow chaining of these intrinsics (eg add a passthru opquety parameter to the defining intrinsics): opaquety %fpenv = llvm.constrainedfp.fpenv(opquety undef, fpround.toneares, fpexcept.strict) opaquety %opt_params = llvm.vp.evlmask(%fpenv, <8 x ii1>%mask, i32 %evl) %c = llvm.fadd(%a, %b, opt_params) BUT before any of these solutions are adopted, i'd rather we model the optional parameters as standard, explicit parameters with default values. We can pursue tha general solution for the optional parameter problem in parallel.
Comment Actions I like your first proposal for optional parameters. It "feels right" to me, and I agree that it is an improvement on operand bundles. Obviously it would take some time to build consensus for something like that as a new IR construct. I can live with the explicit parameter for now if we can agree on rules for how it is defined to behave on targets where it does not apply. Comment Actions Ok. This is a summary of the requested changes and i'd like to get your go ahead before "committing" them to the patch:
Comment Actions I object. The only sensible definition for the upper bound on %evl is the number of elements in the vector type in question (because %evl counts vector elements). We already have a concept of "number of elements in a vector type" that generalizes across all vector types (fixed-width and scalable), and it is quite target-independent: <vscale x $N x $elemtype > (where $N is a constant) has vscale * $N elements. The only target-dependent part is in the positive integer vscale, and its behavior is quite restricted. All of this is already stated in the LangRef (in the section on vector types), and there is no reason for VP intrinsics to repeat it, let alone deviate from it by e.g. introducing new terminology (MVL) or making provisions for more target-dependent behavior than actually exists. The VP instructions should just define their behavior in terms of the number of elements in the vector. Comment Actions +1 Comment Actions I feel like I should mention that I don't know how Chandler feels about the use of "experimental" for these intrinsics. I wasn't trying to claim his approval of my suggestion, merely explaining that I thought the reasoning that led to the current naming of the constrained FP intrinsics probably applied here as well. After I made that suggestion @craig.topper pointed out to me that the "experimental" qualifier has a tendency to never go away. See, for example, vector add/reduce intriniscs. All this is to say that my suggestion is just a suggestion and I could be convinced to drop it if that is the consensus. Comment Actions Well, to set this straight, i didn't mean to imply Chandler's approval. I just want to document what motivates the experimental prefix.
How about we stick with llvm.vp.fadd and go for llvm.vp.fadd.v2, etc when/if the intrinsics are updated? Comment Actions
Comment Actions Thanks for the review!
Comment Actions
Comment Actions
Comment Actions Ping. The i32 -1 special case is gone and canIgnoreVectorLength works for both regular and scalable vector types. Are there any more comments or can this go in? Comment Actions I'm satisfied with the functionality, but I'm not sure about the intrinsics having undefined behavior outside the [0, W] range. The way you've implemented it, it seems like the behavior would be predictable. If the evl argument is outside that range, it is ignored. Applying an unsigned value greater than W using the "%mask AND %EVLmask" also has this effect. Why not just make that the defined behavior?
Comment Actions To directly lower VP to NEC SX-Aurora %evl strictly needs to be within the range 0 to W or you get a hardware exception. Defining any behavior outside of that range thus implies additional instructions to restrict %evl to its bounds or guard the VP op. Clearly we do not want that. At the same time un-defining the behavior outside of that range does not hamper AVX512 code generation in any way.
Semantically, yes. The difference is in the code generation.
When %evl is lowered to a mask there is still a risk of overflow in the comparison when the underlying vector type is widened (consider an operation on <256 x i8> elements and %evl ==258.. when that operation is widened to <512 x i8> you need to do something about that %EVLmask or you'll get spurious active bits in the upper half). If that is UB to begin you do not need to consider it in the %EVLmask computation. So, even non-AVL targets benefit from the strict defined range for %evl. Comment Actions OK. Since the behavior for out-of-range evl is target-dependent, undefined makes sense. I don't know if you were waiting for input from anyone else, but this looks good to me. Comment Actions Great :) Thanks to everybody involved for reviewing! The next patch will introduce the ExpandVectorPredicationPass, which folds %evl into the mask if requested through TLI. Comment Actions Given the scale of the change, it might be good to let people on llvm-dev know that a this patch is ready to land and it is about to be committed (and wait till early next week in case there's additional feedback). Comment Actions Ok, let's wait until next week to give people more time to react. I will also update the reference patch in the main RFC noting that the integer VP patch was accepted. Comment Actions
Comment Actions Reading through the LangRef change post commit I was struck by this wording: "The VP intrinsic has undefined behavior if `%evl > W`." This is a very strong interpretation and doesn't match what we do for arithmetic. One consequence of that difference is that hoisting a vp intrinsic out of a loop will not be legal unless we can prove the EVL is in bounds. You probably want some variant of a poison semantic propagation instead. Unless of course, real hardware happens to fault on out of bounds EVL in which case the current semantic is what you want. :) Otherwise, nice work. |
I expect this to also work for scalable vectors, so maybe add a case here for <vscale x W x T> as well?