This is an archive of the discontinued LLVM Phabricator instance.

[LegalizeTypes][VP] Widen arithmetic ops of fixed length vectors to VP nodes
AbandonedPublic

Authored by luke on Apr 17 2023, 6:12 AM.

Details

Summary

If we have a node with an illegal fixed length vector result type that
needs widened, e.g. x:v6i32 = add a, b
Instead of just widening it to: x:v8i32 = add a, b
We can widen it to the equivalent VP operation and set the EVL to the
exact number of elements needed: x:v8i32 = vp_add a, b, mask=true, evl=6
Provided that the target supports the equivalent VP operation on the
widened type.

This patch applies this technique when widening unary, binary and
ternary ops, as well as selects, but there are more
places this could be applied.

This is an extension of https://reviews.llvm.org/D148713

Diff Detail

Event Timeline

luke created this revision.Apr 17 2023, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2023, 6:12 AM
luke requested review of this revision.Apr 17 2023, 6:12 AM
luke added inline comments.Apr 17 2023, 6:16 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
4320

This was changed so that it can reuse the ShouldWidenToVP logic.

luke updated this revision to Diff 514256.Apr 17 2023, 8:43 AM

Fix optional check

Where do non-power of 2 vectors come from?

Where do non-power of 2 vectors come from?

Languages like SYCL, OpenCL, and shader languages certainly provide vectors of length 3 (which are usually sized as 4 for allocations and such). I've never seen vector lengths other than 3 in the real world.

I've had a long-standing task in my backlog to see if we can make those 3-element vectors legal in the RISC-V backend for better code. For that reason, I find this patch quite elegant.

I don't know if we want a target override to control this behaviour on top of the legality of the VP operations themselves. Just in case a target has legal VP nodes but prefers the widening for whatever reason. I'd suggest we keep this approach as-is, and let other targets tweak the behaviour if it's not suitable.

luke added a comment.Apr 18 2023, 2:54 AM

Where do non-power of 2 vectors come from?

Languages like SYCL, OpenCL, and shader languages certainly provide vectors of length 3 (which are usually sized as 4 for allocations and such). I've never seen vector lengths other than 3 in the real world.

The two cases I had in mind:

  1. If the vector interleave/deinterleave intrinsics are extended to support interleave groups > 2, the loop vectorizer can emit non-power of 2 vectors loads/stores, e.g. 3 groups and a VF of 4 would result in a 12-element load
  2. It might be interesting to explore loosening the restriction on SLP that it must vectorise bundles by powers of 2.

Neither of these actually exist in main yet though. I thought it would be interesting to submit this patch anyway as a proof of concept, and am happy to wait until we have some concrete use cases for it.

I've had a long-standing task in my backlog to see if we can make those 3-element vectors legal in the RISC-V backend for better code.

Credit where credit is due, the basis of this patch came from your comment about fixed-length vectors in https://reviews.llvm.org/D111248!

Do you still remember where were the regressions you saw?
I can confirm that vp_load/vp_store are at least missing one optimisation, in insertelt_v3i64 in fixed-vectors-insert.ll. I'm not sure if it's as a result of not scalarizing or because DAGCombiner doesn't have enough aliasing information.

Can we split load/stores to their own patch? I'm happy to approve those.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
949

Can't we use a VL=6 vdivu.vv?

I'm a little concerned about the lack of DAG combines on VP nodes. Will using VP prevent optimizations?

luke added a comment.Apr 19 2023, 4:41 AM

Can we split load/stores to their own patch? I'm happy to approve those.

Of course, will do.

I'm a little concerned about the lack of DAG combines on VP nodes. Will using VP prevent optimizations?

I believe so. Long term I presume we'll have to flesh out https://reviews.llvm.org/D141891 to cover every combine under the sun, which sounds like a lot of work. (This is the "Incrementally lift DAGCombiner to work on VP SDNodes as well as on regular vector instructions." step of the VP roadmap)
(cc @fakepaper56 do you have any other plans to extend the use of VPMatchContext?)

I'll look into the quality of non-combined-VP-widened code vs combined-but-non-VP-widened code in the wild and get back to you on this.

As a side note, I think loads and stores might be the some of the most affected, since FindBetterChain and friends in DAGCombine would need to be updated to reason about aliasing etc. in the presence of VP nodes.

luke added inline comments.Apr 19 2023, 4:52 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-int.ll
949

I think so, this patch just didn't handle the case for ops that can trap like udiv/urem etc.

luke updated this revision to Diff 514946.Apr 19 2023, 6:54 AM

Split out loads and stores

luke retitled this revision from [LegalizeTypes][VP] Widen fixed length vectors to VP nodes to [LegalizeTypes][VP] Widen arithmetic ops of fixed length vectors to VP nodes.Jun 6 2023, 2:12 AM
luke added a comment.Jun 6 2023, 2:23 AM

Just a brief update from the last time I visited this patch:
Using VP nodes for unary/binary/ternary ops worked well, but unless we're able to use VP nodes for all nodes in a sequence we end up generating extra vsetvli toggles.
In particular, I haven't been able to come up with a VP equivalent sequence for build_vector nodes, and I'm not sure how possible it will be.

After discussing offline with @reames, I think the issue of reducing the VL of arithmetic ops to match the smaller VLs of memory ops from D148713 might be better handled in the vsetvli pass.

luke abandoned this revision.Jun 22 2023, 4:11 AM