Page MenuHomePhabricator

[RISCV] Enable fixed length vectorization
ClosedPublic

Authored by reames on Aug 9 2022, 9:49 AM.

Details

Summary

This change enables the use of RISCV's variable length vector registers for fixed length vectors in the IR, and implicitly enables various IR transforms which generate fixed length vectors if legal (e.g. LoopVectorize and SLPVectorize). Specifically, this enables fixed length vectors which are known to be inbounds of the underlying variable hardware size.

For context, remember that the +V extension provides a minimum VLEN of 128. The embedded variants provide lower minimums. The analogy here is essentially vectorizing for SSE on a machine which may or may not include AVX2/AVX512. We won't get full utilization by default, but we will get some benefit. And of course, with an explicit mcpu we can vectorize to the exact target hardware.

The LV impact is mostly related to vectorizer robustness. In cases we haven't yet fully implemented scalable vectorization support, we can fall back to fixed length vectorization. Note that there a bunch of cases we haven't yet implemented, so in practice this is a fairly major shift towards auto-vectorizing more often.

On the SLP side, I haven't done anywhere near as detailed an evaluation, but the initial investigation I did do ran into a few open issues. Given that, I've posted a change which disables SLP even when fixed vectors are enabled, and marked it as dependency for this one.

Diff Detail

Event Timeline

reames created this revision.Aug 9 2022, 9:49 AM
Herald added a project: Restricted Project. · View Herald Transcript
reames requested review of this revision.Aug 9 2022, 9:49 AM
reames added inline comments.Aug 9 2022, 9:55 AM
llvm/test/CodeGen/RISCV/fold-vector-cmp.ll
13

This case demonstrates an unfortunate, but I think non-blocking interaction. Essentially, with vectors illegal, we force scalarization and then constant fold the individual scalar lanes. With vectors legal, we fail to recognize that scalarization is profitable, or that lane 0 is unused. As a result, we fail to constant fold. This is a general problem with vector codegen optimization, and not directly related to this change.

llvm/test/CodeGen/RISCV/vec3-setcc-crash.ll
2 ↗(On Diff #451188)

TODO: This test needs to be restricted to continue exercising scalar lowering.

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗(On Diff #451188)

TODO: We're deciding that fixed length vectorization is profitable over scalable vectorization when both are legal. That's odd, and probably indicates a problem in the cost model.

llvm/test/Transforms/LoopVectorize/RISCV/scalable-basics.ll
331 ↗(On Diff #451188)

This is an example of fallback logic working as expected.

(We have a known problem around scalable scatter/gather costing.)

llvm/test/Transforms/LoopVectorize/RISCV/scalable-divrem.ll
252 ↗(On Diff #451188)

Again, fallback working as expected due to known problem with scalable vectorization.

reames edited the summary of this revision. (Show Details)Aug 9 2022, 9:55 AM
reames added inline comments.Aug 9 2022, 10:21 AM
llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗(On Diff #451188)

This turns out to be a known issue in the cost model. So as to return a conservative correct cost, we're using an upper bound on VL. This results in scalable vectors (for which maximum VL is potentially quite large) appearing unprofitable. In this case, the cost is a function of log2(max-vl), but we have other instances - such as scatter/gather - where the cost is linear in max-VL and magnifies this effect even further.

I don't think this is a blocking issue for enabling fixed length vectorization.

@reames

Below are some thoughts which might give you some food for thought, but frankly I just wanted to ask someone for an advice^^.

We have a peculiar architecture that has only scalable vectors.
They are different compared to SVE and RISCV V extensions in that the minimum size is just one element and the maximum size is known (e.g. 32 64-bit elements).
The ISA has load / store instructions and a few others with both static and dynamic counters, i.e. you can write (pseudo code) "dst = vload.f32 rep 16 [ptr]" or "dst = vloadf32 rep vlen [ptr]".
We don't currently use the dynamic version, we just map fixed point vectors to pseudo register classes of the same size. This is quite messy, because these pseudo register classes also require pseudo instructions for each vector size (not only loads / stores, but all of them).
I'd like to rework it to have just one register class (like SVE does), but here is the problem: when spilling occurs, one needs to know the effective size of the spilled register. If we use one register class for all possible fixed vectors, the size is hard to figure out, if possible.
I guess you will face the same issue, but it should be easier for you, since you are taking the minimum VLEN, which is rather small, so you can just spill VLEN bits, no matter what the effective size is.

reames added a comment.Aug 9 2022, 3:41 PM

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

llvm/test/CodeGen/RISCV/vec3-setcc-crash.ll
2 ↗(On Diff #451188)

Done

llvm/test/Transforms/LoopVectorize/RISCV/inloop-reduction.ll
85 ↗(On Diff #451188)

Fix out for review as D131519

barannikov88 added a comment.EditedAug 9 2022, 4:07 PM

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

Sorry for intervening. I wanted to bring to your attention that if you map fixed vectors to scalable registers, you will need a way to know the effective size of the register when a need for a spill arises.
I don't know if the ISA allows you to extract the number of elements from the internal part of the vector register, but you can always spill 128 bits (the min VLEN), wasting some stack space if the effective size is smaller.
That is just of more importance for our target, so I was wondering how are you going to solve this spilling issue, if at all.

ADD
I won't take more of your time and just question on discourse forum. Sorry if my point was inappropriate, I probably misunderstood the commit message (my English is far from good).

reames added a comment.Aug 9 2022, 4:31 PM

@barannikov88 - I don't see how your last comment connects to this review. If you want to ask a question on your hardware, please email me and we can chat briefly.

Sorry for intervening. I wanted to bring to your attention that if you map fixed vectors to scalable registers, you will need a way to know the effective size of the register when a need for a spill arises.
I don't know if the ISA allows you to extract the number of elements from the internal part of the vector register, but you can always spill 128 bits (the min VLEN), wasting some stack space if the effective size is smaller.
That is just of more importance for our target, so I was wondering how are you going to solve this spilling issue, if at all.

On RISCV, the current implementation of fixed length vectors uses however many bits of the vector register are required. When spilling a vector register, we currently spill the full register, and make no attempt at tracking which sub-lanes are live. We could in theory spill less if sub-lanes of the register are dead, but a) this is unlikely to be a significant stack savings, and b) the spill instructions easiest to use work on full vector registers.

Matt added a subscriber: Matt.Aug 9 2022, 8:03 PM
reames retitled this revision from [WIP][RISCV] Enable fixed length vectorization to [RISCV] Enable fixed length vectorization.Aug 13 2022, 8:47 AM
reames edited the summary of this revision. (Show Details)
reames set the repository for this revision to rG LLVM Github Monorepo.
reames edited the summary of this revision. (Show Details)Aug 25 2022, 10:51 AM
craig.topper accepted this revision.Aug 26 2022, 12:32 PM

LGTM

llvm/test/CodeGen/RISCV/fpclamptosat_vec.ll
17 ↗(On Diff #451278)

This might need a rebase. I don't think we use vncvt.x.x.w anymore after D132041

This revision is now accepted and ready to land.Aug 26 2022, 12:32 PM
This revision was landed with ongoing or failed builds.Aug 26 2022, 2:45 PM
This revision was automatically updated to reflect the committed changes.