Page MenuHomePhabricator

rkruppe (Robin Kruppe)
User

Projects

User does not belong to any projects.

User Details

User Since
Apr 19 2018, 4:51 AM (65 w, 3 d)

Recent Activity

May 10 2019

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I know very well how annoying it can be to read and write (and say) the scalable prefix all the time and wish for something shorter sometimes, but I also prefer <vscale x ...> for the reasons Sander gave. I'll add that <vscale x 4 x i32> feels a bit lighter than <scalable 4 x i32> even though it's the same number of characters (maybe because there's more whitespace?).

May 10 2019, 12:55 PM · Restricted Project

May 1 2019

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Why implementation defined and not UB for the case where the index exceeds the runtime length? How do you intend to define this for SVE?

SVE uses a predicate for indexed inserts and extracts. We generate that predicate by comparing a splat of the index against a stepvector (0,1,2,3...); if the index is out of range then the predicate will be all false.

For a mov (insert), that results in an unmodified vector.

For a lastb (extract), that extracts the last lane in the vector if no predicate bits are true.

I don't know if RVV or SX-Aurora have similarly defined semantics. If the preference is to make it UB, that's fine.

May 1 2019, 3:02 AM · Restricted Project

Apr 5 2019

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Apr 5 2019, 2:42 AM · Restricted Project

Mar 7 2019

rkruppe added inline comments to D47770: [MVT][SVE] Add EVT strings and Type mapping.
Mar 7 2019, 2:05 PM

Feb 5 2019

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

This seems shaky. When generalized to scalable vector types, it means a load of a scalable vector would be evl.gather(<1 x double*> %p, <scalable n x i1>), which mixes fixed and scaled vector sizes. While it's no big deal to test the divisibility, allowing "mixed scalability" increases the surface area of the feature and not in a direction that seems desirable. For example, it strongly suggests permitting evl.add(<scalable n x i32>, <scalable n x i32>, <n x i1>, ...) where each mask bit controls vscale many lanes -- quite unnatural, and not something that seems likely to ever be put into hardware.

Mixing vector types and scalable vector types is illegal and is not what i was suggesting. Rather, a scalar pointer would be passed to convey a consecutive load/store from a single address.

Feb 5 2019, 12:06 PM · Restricted Project

Feb 4 2019

rkruppe added a comment to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.

We will also need to adjust gather/scatter and possibly other load/store kinds to allow the address vector length to be a divisor of the main vector length (similar to mask vector length). I didn't check if there are intrinsics for strided load/store, those will need to be changed too, to allow, for example, storing <scalable 3 x float> to var.v in:

.. and as a side effect evl_load/evl_store are subsumed by evl_gather/evl_scatter:

evl.load(%p, %M, %L) ==  evl.gather(<1 x double*> %p, <256 x i1>..) ==  evl.gather(double* %p, <256 x i1> %M, i32 %L)
Feb 4 2019, 2:42 PM · Restricted Project
rkruppe added inline comments to D57504: RFC: Prototype & Roadmap for vector predication in LLVM.
Feb 4 2019, 12:59 PM · Restricted Project

Nov 7 2018

rkruppe added inline comments to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
Nov 7 2018, 9:18 AM · Restricted Project

Nov 5 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Today I took a stab at changing my RVV patches to use these intrinsics and that basically went well, affirming belief that these intrinsics are a good fit for RISC-V vectors. I stashes those changes for now rather than continuing to build on them because currently I can't match them with plain old isel patterns so I'd have to write annoying and error-prone custom lowering. That should be a temporary issue, partly due to how I don't really handle predication at the moment, partly due to a surprising extra argument on loads and stores (see inline comment).

Nov 5 2018, 8:16 AM · Restricted Project

Nov 3 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.
  1. there are operations not visible in the IR (such as register copies) for which you'll probably also need this sort of analysis

Fair enough. Would it be possible to simply extend the %dvl of the defining operation to the newly created register? (instead of re-running a full fledged analysis).

At MIR level, using the semantics of RISC-V instructions, that is not generally correct: uses of the copied register can run with a different VL and therefore use lanes that wouldn't be copied by this approach.

Well, if you generate RISC-V instructions starting from EVL intrinsics then undef-on-excess still holds. So, excess lanes should be fair game for spilling. My hope is that %dvl could be annotated on MIR level like divergence is in the AMDGPU backend today. If the annotation is missing, you'd spill the full register.

Nov 3 2018, 3:11 PM · Restricted Project
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Ideally you'd want these intrinsics for all code, yes, but

  1. since backends don't dictate the IR pass pipeline it will be fragile/impossible to guarantee your pass for turning full vector operations into intrinsics will be last

Actually, you could use custom legalization in ISelLowering for this. No pass involved.

Nov 3 2018, 11:39 AM · Restricted Project

Oct 31 2018

rkruppe added inline comments to D53695: Scalable VectorType RFC.
Oct 31 2018, 7:52 AM
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Spilling only the useful prefix of each vector is important, but I don't think we need to change the IR intrinsics' semantics to enable that. I've sketched an analysis that determines the demanded/consumed vector lengths of each vector value (on MIR in SSA form). With this information the backend can do the same optimization whenever the lanes beyond VL are not ever actually observed. This information is already necessary for many reasons other than spilling, such as implementing regular full-width vector operations (i.e., pretty much everything aside from the intrinsics we discuss here) that can sneak into the IR, or even ordinary register copies (on RISC-V at least). Normally I'd be hesitant to staking such an important aspect of code quality on a best-effort analysis, but in this case it seems very feasible to have very high precision:

Actually, you could translate regular vector code to EVL intrinsics first and have your backend only work on that. This is the route we are aiming for with the SX-Aurora SVE backend. We propose undef-on-excess-lanes as the default semantics of dynamicvl. There is no special interpretation nor a change for IR intrinsics' semantics.

Oct 31 2018, 7:23 AM · Restricted Project
rkruppe added a comment to D53695: Scalable VectorType RFC.

This seems like yet another step in the right direction. Of course I may be biased as I've already been happy with previous iterations.

Oct 31 2018, 6:06 AM
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

With the current unmasked_ret semantics, we know exactly the defined range of the result vector because all lanes beyond the dynamicvl argument are undef.
This means that the backend only needs to spill registers up to that value. This matters a lot for wide SIMD architectures like the SX-Aurora (and ARM SVE btw..) where one full vector register comes in at 256x8 byte.

Oct 31 2018, 5:31 AM · Restricted Project
rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

With the semantics defined in @simoll's proposal, the active vector length is actually subtly different from predication in that the former makes some lanes undef while predication takes the lane value from another parameter. I actually don't know what motivates this, in RISC-V masked-out lanes and lanes beyond VL are treated the same and this seems the most consistent choice in any ISA that has both concepts (and ISAs that only have predication would legalize the latter with predication so they too would treat all lanes the the same). Is there an architecture I'm not aware of that makes past-VL lanes undef but leave masked-out lanes undisturbed?

Oct 31 2018, 2:30 AM · Restricted Project

Oct 29 2018

rkruppe added a comment to D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

I am a bit less sure about the new attributes. If it was just about the intrinsics, I'd argue for creating helper query functions like that extract the relevant arguments from a call or Function object, using knowledge of the intrinsic signatures. But on my third reading of the text I finally realized you want to apply them to non-intrinsincs as well. An example of how each of these would be used (e.g. by RV or an OpenMP implementation) would be useful. I can see the value of passing the dynamic vector length in a specific register, but at a glance, unmasked_return seems rarely applicable to user-defined functions (similarly to the returned parameter attribute, which is a bit niche).

Two reasons: first, we want to avoid this kind of hard-coded knowledge about intrinsics and second, the attributes allow you to coalesce vector registers.

Oct 29 2018, 10:15 AM · Restricted Project

Oct 24 2018

rkruppe updated subscribers of D53613: RFC: Explicit Vector Length Intrinsics and Attributes.

Thanks a lot for this proposal! It's very unfortunate I couldn't be at the dev meeting to discuss in person.

Oct 24 2018, 1:35 AM · Restricted Project

Sep 11 2018

rkruppe added a comment to D50820: [VPlan] Implement initial vector code generation support for simple outer loops..

I think I found a typo, but otherwise LGTM too!

Sep 11 2018, 9:15 AM

Aug 31 2018

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Aug 31 2018, 10:30 AM · Restricted Project

Aug 16 2018

rkruppe added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

Our general intention is to make VPInstructions as easy to use as Instructions to many LLVM developers, who aren't necessarily very familiar with vectorizer, and also reduce the duplicate development/maintenance effort. If that requires more subclassing, we'll consider doing that but very carefully, and only as needed basis. At this point, letting VPInstruction to have all the functionality of Instruction is not an objective. We are starting from implementing just enough to satisfy vectorizer needs and minimizing unnecessary divergence in doing so (i.e., what's implemented can be used in a very similar manner).

Sorry that I'm not directly answering your question. Hope this helps in evaluating between the two alternatives: new opcode approach in D50480 and subclassing approach in this patch, however. I think subclassing here helps us avoid unnecessary divergence.

Aug 16 2018, 12:15 PM
rkruppe added a comment to D50823: [VPlan] Introduce VPCmpInst sub-class in the instruction-level representation.

The implementation looks good to me. The interface chosen here (directly mirroring CmpInst from the Value hierarchy in the VPValue hierarchy) also seems like the right direction to me. Besides avoiding the problematic concept of "underlying Instructions" altogether, it also gives a convenient place to put any helper functionality that the vectorizer code might want when generating and manipulating such comparisons.

Aug 16 2018, 11:04 AM

Jul 20 2018

rkruppe added inline comments to D32530: [SVE][IR] Scalable Vector IR Type.
Jul 20 2018, 4:34 AM · Restricted Project

Jul 19 2018

rkruppe added inline comments to D49489: [VPlan] VPlan version of InterleavedAccessInfo..
Jul 19 2018, 5:39 AM

Jul 16 2018

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

Thank you! I took another look and found two nits, sorry for not pointing them out earlier.

Jul 16 2018, 6:10 AM · Restricted Project

Jul 12 2018

rkruppe added a comment to D32530: [SVE][IR] Scalable Vector IR Type.

I just realized the updated RFC doesn't touch on the issue at all, but I think it's safe to say we won't support globals of scalable vector type? Those seems impossible to implement in a sensible way for RISC-V, and if my memory and quick skim-reading is correct, it isn't part of the SVE C language extensions either. If that's correct, I'd expect the verifier to reject global variables whose type is a scalable vector.

Jul 12 2018, 4:26 AM · Restricted Project
rkruppe added a comment to D47775: [AArch64][SVE] Add SplatVector Intrinsic.
Jul 12 2018, 4:24 AM