Page MenuHomePhabricator

rkruppe (Hanna Kruppe)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 19 2018, 4:51 AM (102 w, 5 d)

Recent Activity

Thu, Mar 19

rkruppe added a comment to D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.

Reading through the LangRef change post commit I was struck by this wording: "The VP intrinsic has undefined behavior if `%evl > W`."

This is a very strong interpretation and doesn't match what we do for arithmetic. One consequence of that difference is that hoisting a vp intrinsic out of a loop will not be legal unless we can prove the EVL is in bounds. You probably want some variant of a poison semantic propagation instead. Unless of course, real hardware happens to fault on out of bounds EVL in which case the current semantic is what you want. :)

Otherwise, nice work.

Thu, Mar 19, 11:27 AM · Restricted Project

Feb 14 2020

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

ah... ah... you can't. at least, the last version of the RVV spec that i read (7?) still explicity states, "regardless of what *you* want VL to be set to, the *hardware* gets to decide exactly what value *actually* goes into the VL CSR".

the only guarantee that you have is that you will find that if you set VL to a non-zero value, you will find that, when you read it immediately after setting, it will be non-zero.

Feb 14 2020, 6:41 AM · Restricted Project

Feb 12 2020

rkruppe added a comment to D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.

Ok. This is a summary of the requested changes and i'd like to get your go ahead before "committing" them to the patch:

  1. Define what 'W' is (that is the vector length that %evl is compared against) - for static vector types this would be the number of elements of the vector, for scalable types W == MVL and the target is responsible for defining MVL.
Feb 12 2020, 8:51 AM · Restricted Project
rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Exactly. The VE target strictly requires VL <= MVL or you'll get a hardware exception. Enforcing strict UB here means VP-users have to explicitly drop instructions that keep the VL within bounds. This means that we can optimize the VL computation code and that it can be factored into cost calculations, etc. With Options 2 & 3 this would happen only very late in the backend when most scalar optimizations are already done.

Feb 12 2020, 8:05 AM · Restricted Project

Feb 3 2020

rkruppe added a comment to D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.

I'm not entirely clear where this value comes from, but it seems like whatever generated it must know that we're generating SVE code, right?

This is not architecture specific, and thus this is not assuming SVE. In the case of SVE, the vector length is unknown at compile time (it is a multiple of something), and constant at run-time. In the RISC-V vector extension, I believe it can be changed at any point. Thus, the way to obtain this value is by reading/writing a status register, or something similar, but relying on querying the architecture features. And whether it is constant at runtime, or can be changed at any point, this API seems to cover the different cases.

Feb 3 2020, 2:36 AM · Restricted Project

Feb 2 2020

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

We (Libre-SoC, provisionally renamed from Libre-RISCV) are currently building a processor that supports variable-length vector operations by having each operation specify the starting register in a flat register file, then relying on VL telling it how many elements to operate on, which, when divided by the number of elements per register, directly translates to the number of registers to operate on. So, if VL is out of bounds, the instructions can overwrite registers past the end of the range assigned by the register allocator and/or trap. This would probably force use of option #1 above, at least for our processor. Our ISA design is still incomplete, so we might add (or already have) a mechanism allowing use of option #2 or #3 if there is a sufficient reason (will have to see what the rest of Libre-SoC think).

Feb 2 2020, 12:42 PM · Restricted Project
rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

(This was gonna be an inline comment on D69891, but it's more of a general conceptual issue, so I decided to move it here.)

Feb 2 2020, 8:08 AM · Restricted Project

Jan 31 2020

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

I'm not sure what problem you think there might be? Both code sequences do the same thing (same side effects, same final result) as the input IR they matched, right? So that's what justifies them both as valid outputs and the choice is just a matter of codegen quality. You don't even need to appeal to the vp.fadd producing undef in disabled lanes, because in the final result those lanes are zero anyway and that's all that matters. This doesn't seem fundamentally more tricky than any other isel pattern that matches multiple IR instructions to produce a more efficient combined instruction. For example, if the ARM backend selects add i32 %a, (shl i32 %b, 4) as add r0, r0, r1, lsl #4, it never materializes shl %b, 4 (not into a register, at least) but the end result is still correct.

Jan 31 2020, 10:26 AM · Restricted Project
rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Btw, I guess that https://reviews.llvm.org/D71432 shows that op+select folding can be cleanly implemented in isel and that's also in line with my experiments for the VE target.

This needs a caveat. Keeping the select glued to the operation takes some careful effort. Especially in the undef passthru case, there are a bunch of peeps that will incorrectly fold away the select. E.g. this transform from InstSimplify:

if (isa<UndefValue>(FalseVal))   // select ?, X, undef -> X
  return TrueVal;

The VP intrinsics will certainly be immune to these, but if the plan is to eventually replace the VP select intrinsics with IR selects, then this problem will need to be solved. Just a heads up...

Jan 31 2020, 8:10 AM · Restricted Project

Dec 3 2019

rkruppe added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Dec 3 2019, 10:06 AM · Restricted Project

Dec 2 2019

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

Ok. I do agree that having passthru simplifies isel for certain architectures (and legal combinations of passthru value, type, and operations..) but:
VP intrinsics aren't target intrinsics: they are not supposed to be a way to directly program any specific ISA in the way of a macroassembler, like you would do with llvm.x86.* or llvm.arm.mve.* or any other. Rather, think of them as regular IR instructions. Pretend that anything we propose in VP intrinsics will end up as a feature of a first-class LLVM instructions. Based on that i figured that one VP intrinsics should match one IR instructions plus predication, nothing more.

  • If we had predicated IR instructions, would we want them to have a passthru operand?
Dec 2 2019, 1:02 PM · Restricted Project

Nov 27 2019

rkruppe added inline comments to D61652: [Attr] Introduce dereferenceable_globally.
Nov 27 2019, 11:23 AM · Restricted Project

Nov 24 2019

rkruppe added inline comments to D69891: [VP,Integer,#1] Vector-predicated integer intrinsics.
Nov 24 2019, 7:35 AM · Restricted Project

Nov 23 2019

rkruppe added inline comments to D61652: [Attr] Introduce dereferenceable_globally.
Nov 23 2019, 11:06 AM · Restricted Project

Aug 4 2019

rkruppe updated subscribers of D65718: [LangRef] Document forward-progress requirement.

Clearly the current semantics of LLVM IR have a forward progress guarantee and I agree that should be documented.

Aug 4 2019, 11:47 AM · Restricted Project

May 10 2019

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I know very well how annoying it can be to read and write (and say) the scalable prefix all the time and wish for something shorter sometimes, but I also prefer <vscale x ...> for the reasons Sander gave. I'll add that <vscale x 4 x i32> feels a bit lighter than <scalable 4 x i32> even though it's the same number of characters (maybe because there's more whitespace?).

May 10 2019, 12:55 PM · Restricted Project

May 1 2019

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Why implementation defined and not UB for the case where the index exceeds the runtime length? How do you intend to define this for SVE?

SVE uses a predicate for indexed inserts and extracts. We generate that predicate by comparing a splat of the index against a stepvector (0,1,2,3...); if the index is out of range then the predicate will be all false.

For a mov (insert), that results in an unmodified vector.

For a lastb (extract), that extracts the last lane in the vector if no predicate bits are true.

I don't know if RVV or SX-Aurora have similarly defined semantics. If the preference is to make it UB, that's fine.

May 1 2019, 3:02 AM · Restricted Project

Apr 5 2019

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 5 2019, 2:42 AM · Restricted Project

Mar 7 2019

rkruppe added inline comments to D47770: [MVT][SVE] Add EVT strings and Type mapping.
Mar 7 2019, 2:05 PM · Restricted Project

Feb 5 2019

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

This seems shaky. When generalized to scalable vector types, it means a load of a scalable vector would be evl.gather(<1 x double*> %p, <scalable n x i1>), which mixes fixed and scaled vector sizes. While it's no big deal to test the divisibility, allowing "mixed scalability" increases the surface area of the feature and not in a direction that seems desirable. For example, it strongly suggests permitting evl.add(<scalable n x i32>, <scalable n x i32>, <n x i1>, ...) where each mask bit controls vscale many lanes -- quite unnatural, and not something that seems likely to ever be put into hardware.

Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.

Feb 5 2019, 12:06 PM · Restricted Project

Feb 4 2019

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

We will also need to adjust gather/scatter and possibly other load/store kinds to allow the address vector length to be a divisor of the main vector length (similar to mask vector length). I didn't check if there are intrinsics for strided load/store, those will need to be changed too, to allow, for example, storing <scalable 3 x float> to var.v in:

.. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:

evl.load(%p, %M, %L) ==  evl.gather(<1 x double*> %p, <256 x i1>..) ==  evl.gather(double* %p, <256 x i1> %M, i32 %L)
Feb 4 2019, 2:42 PM · Restricted Project
rkruppe added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 4 2019, 12:59 PM · Restricted Project

Nov 7 2018

rkruppe added inline comments to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Nov 7 2018, 9:18 AM · Restricted Project

Nov 5 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

Nov 5 2018, 8:16 AM · Restricted Project

Nov 3 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  1. there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

Nov 3 2018, 3:11 PM · Restricted Project
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Ideally you'd want these intrinsics for all code, yes, but

  1. since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

Nov 3 2018, 11:39 AM · Restricted Project

Oct 31 2018

rkruppe added inline comments to D53695: Scalable VectorType RFC.
Oct 31 2018, 7:52 AM
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Oct 31 2018, 7:23 AM · Restricted Project
rkruppe added a comment to D53695: Scalable VectorType RFC.

This seems like yet another step in the right direction. Of course I may be biased as I've already been happy with previous iterations.

Oct 31 2018, 6:06 AM
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Oct 31 2018, 5:31 AM · Restricted Project
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

Oct 31 2018, 2:30 AM · Restricted Project

Oct 29 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

I am a bit less sure about the new attributes. If it was just about the intrinsics, I'd argue for creating helper query functions like that extract the relevant arguments from a call or Function object, using knowledge of the intrinsic signatures. But on my third reading of the text I finally realized you want to apply them to non-intrinsincs as well. An example of how each of these would be used (e.g. by RV or an OpenMP implementation) would be useful. I can see the value of passing the dynamic vector length in a specific register, but at a glance, unmasked_return seems rarely applicable to user-defined functions (similarly to the returned parameter attribute, which is a bit niche).

Two reasons: first, we want to avoid this kind of hard-coded knowledge about intrinsics and second, the attributes allow you to coalesce vector registers.

Oct 29 2018, 10:15 AM · Restricted Project

Oct 24 2018

rkruppe updated subscribers of D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

Oct 24 2018, 1:35 AM · Restricted Project

Sep 11 2018

rkruppe added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

I think I found a typo, but otherwise LGTM too!

Sep 11 2018, 9:15 AM

Aug 31 2018

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Aug 31 2018, 10:30 AM · Restricted Project

Aug 16 2018

rkruppe added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

Our general intention is to make VPInstructions as easy to use as Instructions to many LLVM developers, who aren't necessarily very familiar with vectorizer, and also reduce the duplicate development/maintenance effort. If that requires more subclassing, we'll consider doing that but very carefully, and only as needed basis. At this point, letting VPInstruction to have all the functionality of Instruction is not an objective. We are starting from implementing just enough to satisfy vectorizer needs and minimizing unnecessary divergence in doing so (i.e., what's implemented can be used in a very similar manner).

Sorry that I'm not directly answering your question. Hope this helps in evaluating between the two alternatives: new opcode approach in D50480 and subclassing approach in this patch, however. I think subclassing here helps us avoid unnecessary divergence.

Aug 16 2018, 12:15 PM · Restricted Project
rkruppe added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

The implementation looks good to me. The interface chosen here (directly mirroring CmpInst from the Value hierarchy in the VPValue hierarchy) also seems like the right direction to me. Besides avoiding the problematic concept of "underlying Instructions" altogether, it also gives a convenient place to put any helper functionality that the vectorizer code might want when generating and manipulating such comparisons.

Aug 16 2018, 11:04 AM · Restricted Project

Jul 20 2018

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Jul 20 2018, 4:34 AM · Restricted Project

Jul 19 2018

rkruppe added inline comments to D49489: [VPlan] VPlan version of InterleavedAccessInfo..
Jul 19 2018, 5:39 AM

Jul 16 2018

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Thank you! I took another look and found two nits, sorry for not pointing them out earlier.

Jul 16 2018, 6:10 AM · Restricted Project

Jul 12 2018

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I just realized the updated RFC doesn't touch on the issue at all, but I think it's safe to say we won't support globals of scalable vector type? Those seems impossible to implement in a sensible way for RISC-V, and if my memory and quick skim-reading is correct, it isn't part of the SVE C language extensions either. If that's correct, I'd expect the verifier to reject global variables whose type is a scalable vector.

Jul 12 2018, 4:26 AM · Restricted Project
rkruppe added a comment to D47775: [AArch64][SVE] Add SPLAT_VECTOR ISD Node.
Jul 12 2018, 4:24 AM · Restricted Project