Page MenuHomePhabricator

[RISCV] Use fractional LMULs for fixed length types smaller than riscv-v-vector-bits-min.

Authored by craig.topper on Apr 23 2021, 5:54 PM.



My thought process is that if v2i64 is an LMUL=1 type then v2i32
should be an LMUL=1/2 type. We limit the fractional LMUL so that
SEW=64 clips to LMUL=1, SEW=32 clips to LMUL=1/2, etc. This
ensures there's always a fractional LMUL available to truncate a type.
This does reduce the number of vsetvlis in some cases.

Some tests increase vsetvlis because the best container type for a
mask type is dependent on the LMUL+SEW that the mask was produced
from, but you can't tell that from the type. I think this is
something we need to solve this in the machine IR when optimizing

Diff Detail

Event Timeline

craig.topper created this revision.Apr 23 2021, 5:54 PM
craig.topper requested review of this revision.Apr 23 2021, 5:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 23 2021, 5:54 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

The rationale looks reasonable to me.



frasercrmck added inline comments.Apr 28 2021, 4:27 AM

Maybe this comment could be corrected/clarified. I see what it's trying to convey but saying we can't go below LMUL=64/SEW makes it sound like we can't have fractional types at all since they'd always be at least 1.


Is VT.getSizeInBits() * EltsPerBlock just VT.getVectorNumElements() * RISCV::RVVBitsPerBlock?

Rebase and address review comments

craig.topper added inline comments.Apr 28 2021, 11:12 AM

Oops I think I was thinking in terms of the denominator when I wrote that. So it's really SEW/64.

frasercrmck added inline comments.Apr 29 2021, 1:15 AM

I do wonder about this. So with min=128, a v2i1 would use a nxv8i1 container type, but v2i8 => nxv1i8? Might that make certain operations more difficult if we don't have legal nxv1i1, nxv2i1, nxv4i1?

craig.topper added inline comments.Apr 29 2021, 9:12 AM

nxv1i1, nxv2i1, nxv4i1 are legal types they just won't be selected for mask only operations like vmand, vmor, vmxor, etc. They will be used for setcc results. Masks are messed up because a v2i16 setcc would use nxv1i16(lmul 1/4) and v2i8 setcc would use nxv1i8(lmul 1/8). We use nxv1i1 for the mask type for both. To minimize vsetvli changes vmand/vmor/vmxor/etc operations on the mask should use nxv2i1 if the producer was v2i16 or nxv1i1 if the producer was v2i8.

I think enabling fractional lmul here would make the i8 case correct but everything else would still be wrong. I guess I don't have a good reason not to do it. I'll try it and see what happens.

Use fractional lmul for masks too.
Modify the divideCeil slightly to divide MinVLen by RVVBitsPerBlock rather
than multiplying number of elements by RVVBitsPerBlock. Should be the same since
MinVLen is a power 2 that is at least 128. RVVBitsPerBlock is 64.

Matt added a subscriber: Matt.May 7 2021, 8:34 AM
frasercrmck added inline comments.May 10 2021, 6:43 AM

Do you know what's going on here? This strikes me as potentially a bug?


These are redundant but presumably we can optimize them away (with the new VSETVLI insertion pass)?

Rebase after fixing scatter VL calculation.

Updates for recently added tests

This revision is now accepted and ready to land.May 11 2021, 2:54 AM
This revision was landed with ongoing or failed builds.May 11 2021, 9:44 AM
This revision was automatically updated to reflect the committed changes.