This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use RISCV::RVVBitsPerBlock for RGK_ScalableVector in getRegisterBitWidth.
ClosedPublic

Authored by craig.topper on Aug 11 2021, 8:55 PM.

Details

Summary

I might be wrong, but I think this is should be width of the known
min size we use for scalable vectors. It shouldn't scale with
minimum vlen.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 11 2021, 8:55 PM
craig.topper requested review of this revision.Aug 11 2021, 8:55 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 11 2021, 8:55 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

Yeah I'm not sure, to be honest. It could be either meaning, going by a quick look around. Any ideas how we can know conclusively?

According to the description of getRegisterBitWidth, the function returns the width of the largest vector register type, which is probably where SVE and RVV are a bit different. For SVE the maximum vector length is always a multiple of 128bits and bounded by maximum vscale, so we can always return ElementCount of 'vscale x 128'. The LV uses this to determine a suitable VF based on the widest element type. e.g. if the maximum element width is 64bits, the maximum VF would be "vscale x 2", whereas if the max element width is 32bits, the maximum VF would be "vscale x 4". RVV can choose different LMULs, so you may want to return a wider bitwidth as default to get a more suitable vectorization factor, or alternatively experiment with adding a new RGK_* enum value to request a smaller/wider bitwidth. The LoopVectorizer also has an option to choose a higher bandwidth "-vectorizer-maximize-bandwidth", which forces the LV to choose a higher bitwidth based on the smallest element type in the loop (instead of the biggest element type).

According to the description of getRegisterBitWidth, the function returns the width of the largest vector register type, which is probably where SVE and RVV are a bit different. For SVE the maximum vector length is always a multiple of 128bits and bounded by maximum vscale, so we can always return ElementCount of 'vscale x 128'. The LV uses this to determine a suitable VF based on the widest element type. e.g. if the maximum element width is 64bits, the maximum VF would be "vscale x 2", whereas if the max element width is 32bits, the maximum VF would be "vscale x 4". RVV can choose different LMULs, so you may want to return a wider bitwidth as default to get a more suitable vectorization factor, or alternatively experiment with adding a new RGK_* enum value to request a smaller/wider bitwidth. The LoopVectorizer also has an option to choose a higher bandwidth "-vectorizer-maximize-bandwidth", which forces the LV to choose a higher bitwidth based on the smallest element type in the loop (instead of the biggest element type).

Ignoring LMUL for right now. I think what is in the code right now is wrong so I'd like something that is at least functionally correct. If I just want the vectorizer to use at most LMUL=1, I should return the fixed size of 64 that is used by our lmul=1 types, <vscale x 1 x i64>, <vscale x 2 x i32>, <vscale x 4 x i16>? This is what RISCV::RVVBitsPerBlock represents.

frasercrmck accepted this revision.Aug 17 2021, 10:00 AM

Further to the information received in the SVE call, this seems like the correct thing to.

This revision is now accepted and ready to land.Aug 17 2021, 10:00 AM
This revision was landed with ongoing or failed builds.Aug 17 2021, 11:13 AM
This revision was automatically updated to reflect the committed changes.

May I ask a question, why is RISCV::RVVBitsPerBlock set to 64? Any clue(RFC) to this concept? Thanks.

May I ask a question, why is RISCV::RVVBitsPerBlock set to 64? Any clue(RFC) to this concept? Thanks.

We map RVV types to scalable vector types in IR like <vscale x 1 x i64>. Where vscale is a runtime value calculated as (VLEN/RVVBitsPerBlock).

So <vscale x 1 x i64> is ((VLEN/RVVBitsPerBlock) x 1 x 64) bits. Which simplifies to VLEN bits. Any type that simplifies to VLEN bits is an LMUL=1 type. Smaller than VLEN represents a fractional LMUL. Larger would LMUL=2 or 4 or 8.

The value needs to be large enough so that we can support a fractional LMUL of 1/8 for i8 which is required for ELEN=64. With RVVBitsPerBlock==64 we can use <vscale x 1 x i8>. RVVBitsPerBlock also needs to be divisible by ELEN.

RVVBitsPerBlock is the smallest VLEN we can support. I think we are going to need to select a value of 32 at compile time when targeting Zve32x or Zve32f. This will require all the intrinsic types to map to different LLVM IR types depending on which ELEN we are targeting.