Page MenuHomePhabricator

[RISCV] Splat scalar to be of length VL instead of 1 for reductions
AbandonedPublic

Authored by pcwang-thead on Nov 6 2022, 11:34 PM.

Details

Summary

Correct me if I am wrong, I think there is minmal microarchitecture
difference between vmv.s.x with vl==1 and v(f)mv.v.(ixf) with
vl==avl. So we can just splat scalar to be of length VL instead
of 1 to reduce context switch of vtype.

For VP reductions, we can do this iff we can prove AVL is non-zero.

To not increase register pressure, we do this iff LMUL <= 1.

Diff Detail

Event Timeline

pcwang-thead created this revision.Nov 6 2022, 11:34 PM
pcwang-thead requested review of this revision.Nov 6 2022, 11:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 6 2022, 11:34 PM
pcwang-thead edited the summary of this revision. (Show Details)Nov 6 2022, 11:35 PM
pcwang-thead added inline comments.Nov 6 2022, 11:43 PM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
41

Floating-point regressions here.

1347

Spilling because of increasing of register pressure.

pcwang-thead marked 2 inline comments as not done.Nov 6 2022, 11:44 PM

Only for LMUL <= 1.

pcwang-thead edited the summary of this revision. (Show Details)Nov 7 2022, 1:21 AM

Make sure operand's types are of LMUL=1 types

pcwang-thead edited the summary of this revision. (Show Details)Nov 7 2022, 2:59 AM
craig.topper added inline comments.Nov 10 2022, 10:42 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
5805–5806

Move this into an else. We shouldn't create nodes if they are going to end up dead.

  • Address comment.
  • Fix floating-point regressions since this change will prevent combining fadd to reduce.

Comment inline with a problematic case.

Note as well that InsertVSETVLI should already be eliding the vsetvli when we can prove the avl non-zero.

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp-vp.ll
14

There's a subtle semantic distinction between the old and proposed code here where a0 is zero.

The old code would unconditionally insert the neutral element into v9, and then the vfredusum would see a VL=0, and not update the destination register. As a result, the final return value is the neutral element.

The new code leaves v9 unchanged, and thus the result is whatever lane 0 of the v9 register happened to contain previously.

For VP reductions, we can do this iff we can prove AVL is non-zero.

Add tests for VP reductions with known non-zero avl.

pcwang-thead marked an inline comment as done.Nov 13 2022, 8:24 PM
pcwang-thead edited the summary of this revision. (Show Details)

Add missing tests.

pcwang-thead marked an inline comment as done.Nov 20 2022, 6:44 PM

Ping :-)

Gentle ping. :-)

craig.topper added inline comments.Nov 30 2022, 11:31 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
1907

AVL is unsigned. Why getSExtValue?

5808

Add curly braces. LLVM coding standards say that if/else should both uses braces if one does.

5882

Curly braces

5963

Curly braces

llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
368–369

Why is this e32,m1 instead of e32, mf2?

  • Address comments.
  • Fix incorrect EVT when combining to reductions.
pcwang-thead marked 5 inline comments as done.Nov 30 2022, 10:23 PM
pcwang-thead added inline comments.
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-reduction-fp.ll
368–369

Good catch! Fixed.

reames added a comment.Dec 7 2022, 8:53 AM

I'm looking at your test changes - not the code - and it really seems like the sole benefit of this here is vsetvli insertion. I'd default to expecting this to belong inside RISCVInsertVSETVLI.cpp not during ISEL.

From what I can tell glancing at the test deltas, you're effectively proposing the following rules:

  • A scalar insert w/TA demands the "non-zeroness" of the AVL.
  • A splat or broadcast load w/TA demands the an equal or greater AVL (with the same LMUL).

From what I can tell, we have no reason to prefer the v.x/v.i/v.f form over the s.x/s.f forms. The changes in your diff appear to be an artifact of how you implemented the above rules, not a benefit on their own. Unless you disagree with this, I would ask that you rework the patch and patch description to make that obvious.

Worth noting is that even if you decide to keep this in ISEL, there's nothing about the rules above which are specific to reductions.

pcwang-thead marked an inline comment as done.Dec 7 2022, 8:11 PM

I'm looking at your test changes - not the code - and it really seems like the sole benefit of this here is vsetvli insertion.

Yes, you're right. My intention is to reduce vsetvlis.

I'd default to expecting this to belong inside RISCVInsertVSETVLI.cpp not during ISEL.

Thanks for your suggestions, I will do some works on RISCVInsertVSETVLI.cpp to see if it is beneficial.

From what I can tell glancing at the test deltas, you're effectively proposing the following rules:

  • A scalar insert w/TA demands the "non-zeroness" of the AVL.
  • A splat or broadcast load w/TA demands the an equal or greater AVL (with the same LMUL).

From what I can tell, we have no reason to prefer the v.x/v.i/v.f form over the s.x/s.f forms. The changes in your diff appear to be an artifact of how you implemented the above rules, not a benefit on their own. Unless you disagree with this, I would ask that you rework the patch and patch description to make that obvious.

Worth noting is that even if you decide to keep this in ISEL, there's nothing about the rules above which are specific to reductions.

I have no strong opinion on this. Let's see what we can do in RISCVInsertVSETVLI.cpp first. :-)

I sat down yesterday to prototype the approach I had suggested, and basically end up convincing myself I was wrong. Not from a theoretical standpoint on the vsetvli toggles, but from the fact we really do want to change instruction selection in some cases as well.

In that vein, I wrote up a small patch series (D139656) which tries to work towards this objective, and apply the same logic more broadly. I also landed two small changes to InsertVSETVLI to better handle vmv.s.x idioms exposed by these patches. I'm curious what you think of those.

reames added a comment.Dec 9 2022, 4:10 PM

FYI, D139747 implements a strict subset of this patch.

reames requested changes to this revision.Dec 13 2022, 12:19 PM

Can you rebase? I think that most if not all, of the changes from this have now landed in other patches. If you think there's anything left, a rebase would help make that much more obvious.

This revision now requires changes to proceed.Dec 13 2022, 12:19 PM
pcwang-thead abandoned this revision.Dec 13 2022, 6:45 PM

I think there is no left changes.
Thanks. @reames