Page MenuHomePhabricator

[RISCV] Add initial support for converting fixed vectors to scalable vectors during lowering to use RVV instructions.

Authored by craig.topper on Jan 29 2021, 3:53 PM.



This is an alternative to D95563.

This is modeled after a similar feature for AArch64's SVE that uses
predicated scalable vector instructions.a

Rather than use predication, this patch uses an explicit VL operand.
I've limited it to always use LMUL=1 for now, but we can improve this
in the future.

This requires a bunch of new ISD opcodes to carry the VL operand.
I think we can probably lower intrinsics to these ISD opcodes to
cut down on the size of the isel table. Which is why I've added
patterns for all integer/float types and not just LMUL=1.

I'm only testing one vector width right now, but the width is
programmable via the command line.

Diff Detail

Event Timeline

craig.topper created this revision.Jan 29 2021, 3:53 PM
craig.topper requested review of this revision.Jan 29 2021, 3:53 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 29 2021, 3:53 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

Use 0.10 as the version in the new td file

jrtc27 added inline comments.Jan 29 2021, 3:58 PM

Not sure how you managed this diff :)

craig.topper added inline comments.Jan 29 2021, 4:07 PM

That's amazing. I don't either.

Remove stray change


arcbbb added a subscriber: arcbbb.Jan 31 2021, 6:37 PM
arcbbb added inline comments.

For supporting LMUL > 1,
I was thinking calculating LMUL by

unsigned MinVLen = Subtarget.getMinRVVVectorSizeInBits();
unsigned LMul = (VT.getSizeInBits() + MinVLen - 1) / MinVLen;

and expand the case MVT::i8: by

case MVT::i8:
   if (LMul == 1)
      return EVT(MVT::nxv8i8);
   else if (LMul == 2)
      return EVT(MVT::nxv16i8);
   else if (LMul <= 4)
       return EVT(MVT::nxv32i8);
   else if (LMul <= 8)
       return EVT(MVT::nxv64i8);
craig.topper added inline comments.Jan 31 2021, 7:16 PM

I guess my concern is that large LMULs increase register pressure so depending on the code it might be better to split the operations than use the increased LMUL if it will cause spills.

arcbbb added inline comments.Jan 31 2021, 7:35 PM

I have a use case for [1]

blur_x.store_at(blur_y, y).compute_at(blur_y, yi).vectorize(x, 8);

User can evaluate various VLS choices by changing the number of element in .vectorize() and evaluate which LMul gains most performance.
not sure if it makes sense.

HsiangKai added inline comments.Jan 31 2021, 7:53 PM

V has no such requirement or I misunderstood the specification?

craig.topper added inline comments.Jan 31 2021, 7:59 PM

Good point. I blindly copied that from AArch64 and was more focused on getting on to the lowering work.

What restrictions should we have here?

HsiangKai added inline comments.Jan 31 2021, 10:27 PM

There are two restrictions in specification. VLEN≥128 and VLEN must be a power of 2.

khchen added inline comments.Jan 31 2021, 11:31 PM

I think maybe the vectorizer or users(Halide) need to aware the register pressure (or performance) when they perform transformation. If codegen want to help on avoiding register pressure problem, maybe backend could have an IR pass to find out the best LMUL for scalabe vector type operation?

I think this is a sensible direction to consider. I do think we'll need to think about other LMULs though.

If we know the minimum size, there's presumably a way to successively enable larger vectors as the size increases? Ideally with vector-bits-min=1024 we'd be able to support e.g. <16 x i64> with LMUL=1, with vector-bits-min=512 we could do it with LMUL=2. Is that going to be challenging?

We might have to limit it somewhat sensibly: for vector types that would require LMUL=8 we're not necessarily going to see a huge difference in performance compared with just splitting the vector in two.


Copy/paste on SVE?


Is this something that is likely to be part of the vector predication support, once that becomes more "first class"?

craig.topper added inline comments.Feb 1 2021, 11:45 AM

It looks like the vector predication SD nodes also have a mask operand. So I'm not sure if we should synthesize an all 1s mask to pattern match back out. Or if we should DAG combine an all ones mask on the vector predication SD nodes to these nodes.

It looks like for masking, the vector predication nodes just make the masked out elements undefined, so I think we still need a VSELECT to specify a passthru value? So I'm not sure if we should pattern match that sequence or DAG combine to an ISD node that has the mask, passthru, and VL all together?

Matt added a subscriber: Matt.Feb 1 2021, 12:52 PM

Add support for different LMuls with a command line option to limit.

HsiangKai added inline comments.Feb 2 2021, 10:45 PM
60 ↗(On Diff #320600)

Need rebase here.


Use RVVBaseAddr instead of reg_rs1.


small case for "experimental-zfh".


Do we need different check prefix for riscv32 and riscv64? That is LMULMAX2-RV32 and LMULMAX2-RV64.

-Correct the register classes for fixed vectors for LMUL > 1.
-Correct the mapping for udiv/sdiv. I copy pasted the same mistake SDPatterns had before

Remove stale FIXME

frasercrmck added inline comments.Feb 4 2021, 1:43 AM

Yeah that seems a shame. I've been wondering how we'd support the vp intrinsics and I guess that this question about the SDNodes is an extension of that. I seem to recall @rogfer01 asking about them on the list: I wonder if he has any ideas? Do we only ever expect an all-ones mask and mandate that the VL is the only "predicate" we use?

Circling back a bit: my original question was about the fact that it seems a shame we have to make our own copies of all of these nodes. Could we lower fixed-length vectors to scalable-vector VP nodes with an all-ones mask, and pattern-match that?

But if we do indeed have to duplicate nodes to account for the passthru value for "full" VP support then we'd probably have to add some operands to these nodes. It would be good to share these nodes for this purpose and for the VP support. Do we do that now to get it out of the way, or later? Maybe the extra VP operands could be optional which means that the "fixed-length" patterns won't have to change when the time comes.


typo: beyong

-Add mask argument to binary ops. Did not add passthru value which is consistent with VP_*
-Maintaining custom set of nodes since FP nodes ISD::VP_ are missing. Also the VE target is translating all VP_* nodes to custom nodes.

frasercrmck requested changes to this revision.Feb 5 2021, 2:42 AM
frasercrmck added inline comments.

The sdiv/udiv mapping is incorrect. Somehow it sneaked back in?

This revision now requires changes to proceed.Feb 5 2021, 2:42 AM

The "require changes" feels really drastic to me but I thought HsiangKai's LGTM was official acceptance, sorry!

The "require changes" feels really drastic to me but I thought HsiangKai's LGTM was official acceptance, sorry!

I also think this patch needs your acceptance. So, I didn't accept the patch. I think there still are some issues to address. Thanks for your feedback.

craig.topper added inline comments.Feb 5 2021, 9:36 AM

The RV32 and RV64 codegen seem to be the same for all tests for LMULMAX2. We're not allowed to have unused prefixes that don't appear in a function so I can't add LMULMAX2-RV32/RV64 if they aren't used.

Fix sdiv/udiv swap again

Use SDValue instead of auto in a couple places.
Run clang-format

Fix capitalization of +experimental-zfh in test RUN lines.

frasercrmck accepted this revision.Feb 8 2021, 6:49 AM

LGTM. I don't know if others want to review it?


nit: This is 14.12 in v0.10. I'm changing that in the SD pats as part of D96028

This revision is now accepted and ready to land.Feb 8 2021, 6:49 AM
This revision was landed with ongoing or failed builds.Feb 8 2021, 10:43 AM
This revision was automatically updated to reflect the committed changes.
Jim added inline comments.Apr 19 2021, 6:38 PM


Could I ask you how to update this check label of this function?

I use to update this check label of this function.

But It deletes all LMULMAX1-RV32 and LMULMAX1-RV64 label and adds LMULMAX1.

No just update label which is already existed.

craig.topper added inline comments.Apr 19 2021, 8:54 PM

I just ran the script on the test on trunk and nothing change.