Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

[RISCV] Combine vmv.s.x of constants into vmv.v.i

Authored by luke on May 29 2023, 9:16 AM.



If a vmv.s.x has an undef passthru and the scalar is a constant that
fits into 5 bits, then we can use a vmv.v.i to save an li. If the VL is
a constant > 0, then we can also just set that to 1 to allow for more
toggle removal opportunities.

This patch adds a combine for this, and also teaches the insert vsetvli
pass how to treat vmv.v.i similarly to vmv.s.x, where we can expand the
SEW to avoid a toggle since we're only writing one element. Without the
vsetvli change it actually results in worse codegen because of the
additional toggle.

We could just use the VSETVLIInfo in needVSETVLI to check for this in
the top-to-bottom pass, but then we miss out on the post-lowering
bottom-to-top pass, which catches cases where the vmv.v.i is the first
instruction in an entry block.

As noted in getDemanded, we shouldn't be looking at the VL operand as
it could be stale if the instruction is already lowered.
But I've convinced myself that checking the VL operand is actually ok here,
because even if it's stale, the demanded fields for a vmv.v.i with VL=1
should still be the same, regardless of what its actual lowered VL is.
(A sanity check here would be greatly appreciated!)

Diff Detail

Event Timeline

luke created this revision.May 29 2023, 9:16 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2023, 9:16 AM
luke requested review of this revision.May 29 2023, 9:16 AM

We may need to limit it to LMUL<=1 in order not to increase register pressure.

Similar patches: D139656 and its related patches.

reames added inline comments.May 30 2023, 1:10 PM

This looks like something we should maybe be doing for all vmv.s.x instructions. What happens if you hoist this out of the vmv.v.i specific transform?


The transform as written here is incorrect. Consider the case where we have a non-undef pass through operand.

Take a look at the use of isImplictDef in needVSETVLI.

luke added inline comments.Jun 1 2023, 3:44 AM

The VLs do get smaller but there's the same amount of toggles. I think the insertvsetvli pass already allows the VL to be increased or decreased on a vmv.s.x pseudo to avoid a toggle (as long as it maintains that it is or isn't equal to zero), whilst here it was specifically to accommodate the fact that this gets selected as a vmv.v.i pseudo

I'm not 100% sure if it's worthwhile but I've attached the diff anyway

luke added inline comments.Jun 1 2023, 5:10 AM

Thanks for pointing this out, just want to check my understanding here: Do we currently allocate M2/4/8 register classes for vmv.s.x of LMUL>1, or do we ignore register groups for it yet? I see there's but it doesn't look to be upstream yet.

If we are allocating M2/4/8, then I think this transform from vmv.x.s to vmv.v.i will keep LMUL the same.

Otherwise if not, then I can see why we would need to add the check to avoid LMUL=1 ballooning to LMUL=8

luke updated this revision to Diff 527383.Jun 1 2023, 6:02 AM

Check that the tail is undefined, also convince myself that checking the VL operand is ok


Your understanding is right that we allocate M2/4/8 register classes for vmv.s.x of LMUL>1. D139699 is just a PoC and I don't know if it's the right way to go. :-)
I think I misunderstood your patch before, there is no need to consider the register pressure because LMUL won't change.
I have no problem here now. :-)

luke added inline comments.Jun 6 2023, 2:08 AM

No worries, thanks for the clarification! If the PoC does go in then let me know and I can revisit this to constrain LMUL

luke retitled this revision from [WIP][RISCV] Combine vmv.s.x of constants into vmv.v.i to [RISCV] Combine vmv.s.x of constants into vmv.v.i.Jun 6 2023, 8:28 AM
luke edited the summary of this revision. (Show Details)
luke abandoned this revision.Wed, Sep 20, 7:06 AM

Done in D152845