This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/docs/
-
docs/
-
RISCV/
6/15
RISCVVectorExtension.rst
-
UserGuides.rst

Differential D142348

[RISCV][Docs] Document code generation for vector extension
Needs ReviewPublic

Authored by luke on Jan 23 2023, 3:55 AM.

Download Raw Diff

Details

Reviewers

rogfer01
asb
craig.topper
kito-cheng
frasercrmck

Summary

Over the past few weeks I've been documenting my understanding of how code is generated for the vector extension.
I thought it would be useful to solidify this knowledge somewhere, so I have written a document that is largely based off of the original RFC, but updated for the current state of lib/Target/RISCV.

Specifically, it gives a walkthrough of how code is generated for

Scalable vectors
Fixed-length vectors
Vector predication instructions

It may be the case that this documentation is too implementation specific and will get outdated quickly, so let me know if there is a better place to share this knowledge.
And likewise, there may be parts that I'm misunderstanding, so please feel free to correct me!

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

luke created this revision.Jan 23 2023, 3:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 3:55 AM

Herald added subscribers: asb, pmatos, VincentWu and 27 others. · View Herald Transcript

luke requested review of this revision.Jan 23 2023, 3:55 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 23 2023, 3:55 AM

Herald added subscribers: llvm-commits, alextsao1999, • pcwang-thead and 2 others. · View Herald Transcript

luke edited the summary of this revision. (Show Details)Jan 23 2023, 4:05 AM

luke added reviewers: rogfer01, asb, craig.topper, kito-cheng, frasercrmck.

luke added inline comments.Jan 23 2023, 4:07 AM

llvm/docs/RISCV/RISCVVectorExtension.rst
317	Am I correct in understanding that the main reason for the `VL` nodes is that it defers having to select an `LMUL`? Or is there another reason as to why they are used vs just selecting a pseudo instruction directly and using a constant for its VL operand

Rendered PDF

RISC-V Vector Extension — LLVM 16.0.0git documentation.pdf157 KBDownload

Thanks for kicking this off. Unfortunately I don't have the time right now to give an in-depth review.

Harbormaster completed remote builds in B209322: Diff 491293.Jan 23 2023, 5:00 AM

I only did a quick review.

llvm/docs/RISCV/RISCVVectorExtension.rst
163	VLEN is not limited to 128 bits. It can be 32 or 64.
186	This mapping also prevents the value of vscale from being examined if ELEN and VLEN are both 32.
294	This file is also used VP intrinsics for scalable vectors.

Matt added a subscriber: Matt.Feb 6 2023, 9:14 AM

luke added inline comments.Feb 6 2023, 3:02 PM

llvm/docs/RISCV/RISCVVectorExtension.rst
186	I'm not sure if I understand, could you clarified by what you mean by examined? Is it related to https://reviews.llvm.org/D128286
294	This is mentioned in the vector predication section below

Add section on standard vector extensions

craig.topper added inline comments.Feb 6 2023, 3:19 PM

llvm/docs/RISCV/RISCVVectorExtension.rst

186

We can't use llvm.vscale or ISD:VSCALE if ELEN is 32. See this code in RISCVISelLowering.cpp. Specifically the fatal_error.

case ISD::VSCALE: {
  MVT VT = Op.getSimpleValueType();
  SDLoc DL(Op);
  SDValue VLENB = DAG.getNode(RISCVISD::READ_VLENB, DL, VT);
  // We define our scalable vector types for lmul=1 to use a 64 bit known
  // minimum size. e.g. <vscale x 2 x i32>. VLENB is in bytes so we calculate
  // vscale as VLENB / 8.
  static_assert(RISCV::RVVBitsPerBlock == 64, "Unexpected bits per block!");
  if (Subtarget.getRealMinVLen() < RISCV::RVVBitsPerBlock)
    report_fatal_error("Support for VLEN==32 is incomplete.");

craig.topper added inline comments.Feb 6 2023, 3:25 PM

llvm/docs/RISCV/RISCVVectorExtension.rst
317	Selecting a pseudoinstruction from lowering would be make it difficult to do other optimizations without checking different combinations of pseudoinstruction opcodes. We can't use the fixed vector types in the isel patterns because the mapping from fixed vector type to LMUL isn't static and isel patterns require the types to be explicitly mentioned in the patterns.

luke added inline comments.Feb 6 2023, 3:39 PM

llvm/docs/RISCV/RISCVVectorExtension.rst
186	Thanks, that makes sense.

Add note about mapping of types when VLEN=32

Harbormaster completed remote builds in B212224: Diff 495299.Feb 6 2023, 4:39 PM

LWenH added a subscriber: LWenH.Jul 1 2023, 9:56 PM

Herald added subscribers: wangpc, jobnoorman. · View Herald TranscriptJul 1 2023, 9:56 PM

LWenH added inline comments.Jul 1 2023, 9:59 PM

llvm/docs/RISCV/RISCVVectorExtension.rst
212	Hi, as a doc reader I'm quite confuse why here for the i128 type would be the 1/2 under the LMUL=1. In order to represent the i128 type, shouldn't the vscale be extended to 2 (LMUL=1) to be more direct?

barannikov88 added a subscriber: barannikov88.Jul 1 2023, 10:03 PM

It would be great if there was a section describing how vector registers are spilled / restored.
Or, more generally, how the stack space is allocated for registers that don't have length know at compile time.

llvm/docs/RISCV/RISCVVectorExtension.rst
20	"is also a power of two" repeats the previous sentence.
35
40–42	Should the group names be the same as in "Register classes" seciton, v0m4 etc.?
128

evandro removed a subscriber: evandro.Jul 2 2023, 4:03 PM

luke added inline comments.Jul 4 2023, 3:45 AM

llvm/docs/RISCV/RISCVVectorExtension.rst
212	At LMUL=1, the minimum vector size should be 64 bits, but you can't specify the minimum vector length as anything less than 1 e.g. `<vscale x ½ x i128>`, if that makes sense. I guess it could represent i128 vectors at LMUL=2 with `<vscale x 1 x i128>` though

Revision Contents

Path

Size

llvm/

docs/

RISCV/

RISCVVectorExtension.rst

407 lines

UserGuides.rst

3 lines

Diff 495299

llvm/docs/RISCV/RISCVVectorExtension.rst

This file was added.

=========================

RISC-V Vector Extension

=========================

.. contents::

:local:

The RISC-V Vector extension provides vector computation capabilities to the RISC-V architecture [RVV]_.

This guide is based off the original RFC proposing code generation for the extension [RVV-CodeGen-RFC]_, and briefly outlines the features of the extension, as well as giving an overview of how the RISC-V backend generates code for it.

Overview

========

The vector extension adds 32 vector registers ``v0``, ``v1``, ..., ``v31`` to the ISA.

Unlike typical SIMD ISAs, the size in bits of each vector register is an implementation-specific parameter called ``VLEN`` and must be a power of two.

``VLEN`` may also have additional constraints depending on the exact vector extension, see :ref:`standard vector extensions` for more details.

Vector registers are partitioned (i.e. densely packed) in elements whose size in bits is a power of two, ranging from 8 to a maximum called ``ELEN``.

``ELEN`` is also a power of two and :math:`\texttt{ELEN} \leq \texttt{VLEN}`.

barannikov88Unsubmitted

Not Done

"is also a power of two" repeats the previous sentence.

barannikov88: "is also a power of two" repeats the previous sentence.

Due to encoding constraints, not all the operands of a vector operation are encoded in the instructions themselves.

Two CSR (control and status registers) are used instead:

- ``vl``: the number of elements being operated, called the vector length. A vector instruction will operate the elements ``0`` to ``vl-1``

- ``vtype``: the vector type. This register encodes the element size of the operation, called the standard element width (SEW) and a vector grouping mechanism called the length multiplier (LMUL)

Length multiplier

-----------------

The length multiplier (LMUL) can take values 1, 2, 4, 8, 1/2, 1/4, 1/8.

It is encoded as a power of two, where :math:`\text{LMUL} = 2^k, -3 \leq k \leq 3`.

- When :math:`\text{LMUL} = 1` the vector instructions operate on the (32) vector registers.

barannikov88Unsubmitted

Not Done

It is encoded as a power of two, where :math:`\text{LMUL} = 2^k, -3 \leq k \leq 3`.

- - When :math:`\text{LMUL} = 1` the vector instructions operate on the (32) vector registers.

+ - When :math:`\text{LMUL} = 1` the vector instructions operate on whole vector registers.

- When :math:`\text{LMUL} \lt 1` the vector instructions operate on the lowest half, quarter or eighth of a vector register.

barannikov88:

- When :math:`\text{LMUL} \lt 1` the vector instructions operate on the lowest half, quarter or eighth of a vector register.

- When :math:`\text{LMUL} \gt 1` the vector instructions operate on vector groups encoded in the instruction using the lowest numbered vector register of the group.

A vector group is the set of consecutive vector registers ``v{LMUL*i}``, ``v{LMUL*i+1}``, ... , ``v{LMUL*(i + 1) - 1}``. So

- :math:`\text{LMUL}=2` has 16 groups: ``v0``, ``v2``, ``v4``, ..., ``v28``, ``v30``

- :math:`\text{LMUL}=4` has 8 groups: ``v0``, ``v4``, ``v8``, ``v12``, ``v16``, ``v20``, ``v24``, ``v28``

- :math:`\text{LMUL}=8` has 4 groups: ``v0``, ``v8``, ``v16``, ``v24``

barannikov88Unsubmitted

Not Done

Should the group names be the same as in "Register classes" seciton, v0m4 etc.?

barannikov88: Should the group names be the same as in "Register classes" seciton, v0m4 etc.?

For instance, under :math:`\text{LMUL}=4`, a vector group ``v4`` operand includes vector registers ``v4``, ``v5``, ``v6`` and ``v7`` as if they had been concatenated as a four times larger vector register.

LMUL is useful to align the number of elements in vector codes whose element sizes are different (say when combining vectors of 32- and 64-bit elements) or when doing *widenings* (zero, sign or fp extensions) or *narrowings* (truncations).

Setting ``vl`` and ``vtype``

----------------------------

A program must ensure that both ``vl`` and ``vtype`` have the correct values for a vector operation before executing a vector instruction.

This is done using the ``vsetvli`` instruction.

.. code-block:: nasm

vsetvli rdest, rsrc, sew,lmul,tx,mx # tx,mx is described in Masks and tails

``rsrc`` is the application vector length (AVL) and will be used when setting the ``vl``. ``rdest`` is updated with the value of ``vl``.

The spec allows some latitude here but a simple functional model of what ``vsetvli`` does is the following:

.. math::

\text{vl} &\gets \min(\text{rsrc}, \frac{\text{LMUL} \times \text{VLEN}}{\text{SEW}}) \\

\text{vtype} &\gets \text{SEW},\text{lmul},\dots

There is also ``vsetivli`` for when the AVL is an immediate, and ``vsetvl`` for when the AVL and ``vtype`` are both registers.

``vsetvli`` has a couple of special cases:

- When ``rsrc`` is ``x0`` and ``rdest`` is not ``x0`` then :math:`\text{vl} \gets \text{lmul} \times \frac{\text{VLEN}}{\text{SEW}}`.

In other words, sets ``vl`` to be the maximum vector length for a given LMUL and SEW.

This is useful for whole-register operations.

.. code-block:: nasm

vsetvli t0, x0, e32,m2,ta,ma # vl ← 2*VLEN/64

# vtype ← e32,m2,…

# t0 ← vl

- When ``rsrc`` and ``rdest`` are both ``x0`` (the hard-coded zero of RISC-V) then ``vl`` is used as the AVL. This can be used to change the ``vtype`` when we know the ratio :math:`\frac{\text{SEW}}{\text{LMUL}}` will be preserved.

.. code-block:: nasm

vsetvli x0, x0, e64,m4,ta,ma # changing vtype from e32,m2 to e64,m4 is OK (vl is unchanged)

# vtype ← e64,m4,…

Two simple examples (register ``x10`` contains the AVL)

- Add two 32-bit element vectors under :math:`\text{LMUL}=1`

.. code-block:: nasm

vsetvli x0, x10, e32,m1,ta,ma

vadd.vv v1, v2, v3 # v1[0:vl-1] ← v2[0:vl-1] + v3[0:vl-1]

# where v[i:j] is all v[x] where i <= x <= j

- Add two 64-bit element vectors under :math:`\text{LMUL}=2`

.. code-block:: nasm

vsetvli x0, x10, e64,m2,ta,ma

vadd.vv v2, v4, v6 # Updates v2 and v3. Reads v4, v5 and v6, v7

# v2[0:x-1] ← v4[0:x-1] + v6[0:x-1] where x = min(VLEN/64, vl)

# v3[0:y-1] ← v5[0:y-1] + v7[0:y-1] where y = vl - x

.. note::

``vsetvli`` is commonly used for stripmining, like in the example below:

.. code-block:: nasm

# on entry:

# a0 holds the total number of elements

# a1 holds the address of the source array

loop:

vsetvli t0, a0, e32,m8,ta,ma # setup VL, LMUL=8

vle32.v v8, (a1) # load elements

vadd.vi v8, v8, 1 # process elements

vse32.v v8, (a1) # store updated elements

sub a0, a0, t0 # decrement count

slli t0, t0, 2 # increment address

add a1, a1, t0

bnez a0, loop # loop until all processed

The way you would read the ``vsetvli`` is as follows:

- ``e32,m8``: Group the registers together into groups of 8 (:math:`\text{LMUL}=8`) and partition them into 32-bit elements.

- ``ta,mu``: Be tail agnostic and mask agnostic: We don't care about what's in the elements that aren't processed.

barannikov88Unsubmitted

Not Done

- ``e32,m8``: Group the registers together into groups of 8 (:math:`\text{LMUL}=8`) and partition them into 32-bit elements.

- - ``ta,mu``: Be tail agnostic and mask agnostic: We don't care about what's in the elements that aren't processed.

+ - ``ta,ma``: Be tail agnostic and mask agnostic: We don't care about what's in the elements that aren't processed.

- ``a0``: Try and process ``a0`` elements, or as many as the hardware supports.

barannikov88:

- ``a0``: Try and process ``a0`` elements, or as many as the hardware supports.

- ``t0``: Store ``vl``, i.e. the number of elements that will be processed this iteration

.. _masks and tails:

Masks and tails

---------------

The RISC-V Vector extension supports masks in almost all of its instructions.

There are no distinguished mask registers, instead vector registers can be used to represent masks.

However an instruction whose execution is masked can only use the ``v0`` register as the mask operand.

Elements of the destination register that are masked off by the mask are called *inactive elements* (i.e. masked-off)

A vector instruction can be executed under a ``vl`` setting where :math:`\texttt{vl} \lt \text{LMUL} \times \frac{\texttt{VLEN}}{\text{SEW}}`.

Elements of the destination register past the current ``vl`` are called the tail elements.

There are two modes for the tail and inactive elements

- undisturbed, in which the element of the destination register is left unmodified

- agnostic, in which the elements of the destination register is either left unmodified or all its bits set to 1 (for debugging purposes). In this mode we cannot assume anything about the bits of those elements

``tx,mx`` in ``vsetvli`` above correspond to these two policies and can be combined in 4 ways:

- ``tu,mu``: Both tail and inactive are left undisturbed

- ``ta,ma``: Both tail and inactive are agnostic

- ``tu,ma``: Tail is left undisturbed and inactive are agnostic

- ``ta,mu``: Tail is agnostic and inactive are left undisturbed.

.. _standard vector extensions:

Standard vector extensions

--------------------------

craig.topperUnsubmitted

Not Done

VLEN is not limited to 128 bits. It can be 32 or 64.

craig.topper: VLEN is not limited to 128 bits. It can be 32 or 64.

Formally, the vector extension exists in multiple variants, each of which imposes additional constraints on ``VLEN`` and ``EEW`` (the effective ``SEW`` for a specific vector operand):

``Zvl*``

Extensions of the form ``Zvl32b``, ``Xvl64b``, etc.

These don't actually contain any instructions but just dictate the minimum required ``VLEN``.

All the extensions below require one of the ``Zvl`` extensions.

``Zve``

A smaller subset of the vector extension designed for use in embedded devices.

Specifies a minimum ``VLEN`` and the range of supported ``EEW``s.

For example, ``Zve32x`` requires ``Zvl32b`` and supports ``EEW = {8, 16, 32}``.

``Zve64f`` requires ``Zvl64b``, supports ``EEW = {8, 16, 32, 64}`` and also provides 32-bit floating point instructions.

``v``

This is the single letter version of the vector extension intended for use in application contexts.

It requires ``Zvl128b`` as well as the ``f`` and ``d`` extensions, and provides all the instructions defined in the specification.

Mapping to LLVM IR Types

========================

Since ``VLEN`` is an unknown constant from the compiler's perspective, the RISC-V backend takes the same approach as AArch64's SVE and uses scalable vector types [SVE-RFC]_.

craig.topperUnsubmitted

Not Done

This mapping also prevents the value of vscale from being examined if ELEN and VLEN are both 32.

craig.topper: This mapping also prevents the value of vscale from being examined if ELEN and VLEN are both 32.

lukeAuthorUnsubmitted

Done

I'm not sure if I understand, could you clarified by what you mean by examined? Is it related to https://reviews.llvm.org/D128286

luke: I'm not sure if I understand, could you clarified by what you mean by examined? Is it related…

craig.topperUnsubmitted

Not Done

We can't use llvm.vscale or ISD:VSCALE if ELEN is 32. See this code in RISCVISelLowering.cpp. Specifically the fatal_error.

case ISD::VSCALE: {
  MVT VT = Op.getSimpleValueType();
  SDLoc DL(Op);
  SDValue VLENB = DAG.getNode(RISCVISD::READ_VLENB, DL, VT);
  // We define our scalable vector types for lmul=1 to use a 64 bit known
  // minimum size. e.g. <vscale x 2 x i32>. VLENB is in bytes so we calculate
  // vscale as VLENB / 8.
  static_assert(RISCV::RVVBitsPerBlock == 64, "Unexpected bits per block!");
  if (Subtarget.getRealMinVLen() < RISCV::RVVBitsPerBlock)
    report_fatal_error("Support for VLEN==32 is incomplete.");

craig.topper: We can't use llvm.vscale or ISD:VSCALE if ELEN is 32. See this code in RISCVISelLowering.cpp.

lukeAuthorUnsubmitted

Done

Thanks, that makes sense.

luke: Thanks, that makes sense.

Scalable vector types are of the form ``<vscale x n x ty>``, which indicate a vector with a multiple of ``n`` elements of type ``ty``.

LLVM supports only ``ELEN=32`` or ``ELEN=64``, so ``vscale`` is defined as ``VLEN/64``.

This makes the LLVM IR types stable between the two ``ELEN`` s considered, i.e. every LLVM IR scalable vector type has exactly one corresponding pair of element type and LMUL, and vice-versa.

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+===================+===============+================+==================+===================+===================+===================+===================+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

+-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+

(Read ``<v x k x ty>`` as ``<vscale x k x ty>``)

One downside of this design is that it doesn’t allow vectors of i128 (this is, ELEN=128).

In that case vscale would have to be 1/2 under :math:`\text{LMUL}=1`.

LWenHUnsubmitted

Done

Hi, as a doc reader I'm quite confuse why here for the i128 type would be the 1/2 under the LMUL=1. In order to represent the i128 type, shouldn't the vscale be extended to 2 (LMUL=1) to be more direct?

LWenH: Hi, as a doc reader I'm quite confuse why here for the i128 type would be the 1/2 under the…

lukeAuthorUnsubmitted

Done

At LMUL=1, the minimum vector size should be 64 bits, but you can't specify the minimum vector length as anything less than 1 e.g. <vscale x ½ x i128>, if that makes sense. I guess it could represent i128 vectors at LMUL=2 with <vscale x 1 x i128> though

luke: At LMUL=1, the minimum vector size should be 64 bits, but you can't specify the minimum vector…

This type (and its fp counterpart float128) are not that common and in case of extreme necessity types for :math:`\text{LMUL}=2` could be used instead.

Additionally, this design prevents us from being able to compute a value for ``vscale`` when ``VLEN=32``.

Mask vector types

-----------------

As for mask vectors, they are physically represented using a layout of densely packed bits in a vector register.

They are mapped to the following LLVM IR types:

- <vscale x 1 x i1>

- <vscale x 2 x i1>

- <vscale x 4 x i1>

- <vscale x 8 x i1>

- <vscale x 16 x i1>

- <vscale x 32 x i1>

- <vscale x 64 x i1>

Two types with the same ratio SEW/LMUL will have the same related mask type. For instance, two different comparisons one under SEW=64, LMUL=2 and the other under SEW=32, LMUL=1 will both generate a mask <vscale x 2 x i1>.

================

There are four register classes for vectors:

- ``VR`` for vector registers (``v0``, ``v1,``, ..., ``v32``). Used when :math:`\text{LMUL} \leq 1` and mask registers.

- ``VRM2`` for vector groups of length 2 i.e. :math:`\text{LMUL}=2` (``v0m2``, ``v2m2``, ..., ``v30m2``)

- ``VRM4`` for vector groups of length 4 i.e. :math:`\text{LMUL}=4` (``v0m4``, ``v4m4``, ..., ``v28m4``)

- ``VRM8`` for vector groups of length 8 i.e. :math:`\text{LMUL}=8` (``v0m8``, ``v8m8``, ..., ``v24m8``)

:math:`\text{LMUL} \lt 1` types and mask types do not benefit from having a dedicated class, so ``VR`` is used in their case.

.. _scalable vector codegen:

Scalable Vector Codegen

=======================

Let's consider a very simple case using a whole-register op (this example uses :math:`\text{LMUL}=2`)

.. code-block:: llvm

%c = add <vscale x 4 x i32> %a, %b

From the above we get the following ISel DAG:

.. code-block::

t5: nxv4i32 = add t2, t4

Which then gets selected as a pseudo instruction:

.. code-block::

t6: nxv4i32 = PseudoVADD_VV_M2 t2, t4, TargetConstant:i32<-1>, TargetConstant:i32<5>

Each vector instruction has multiple pseudo instructions defined in ``RISCVInstrInfoVPseudos.td``, with their patterns defined in ``RISCVInstrInfoVSDPatterns.td``.

For example, ``VADD_VV`` has pseudo instructions for ``PseudoVADD_VV_M1``, ``PseudoVADD_VV_M2``, and so on.

The ``M2`` suffix means that we're operating on groups of :math:`\text{LMUL}=2`, and the ``VV`` suffix means we're doing a vector-vector operation (i.e. ``vadd.vv``).

Other suffixes include ``VX`` for vector-scalar and ``VI`` for vector-immediate.

The first two operands ``t2`` and ``t4`` to the pseudo instruction are the inputs to the regular ``VADD_VV`` instruction, ``vs1`` and ``vs2`` respectively.

The third is the AVL, i.e. how many elements do we want to operate on, and is of type ``XLenVT``. It's set to -1 here because we want to operate on all the elements.

.. note::

Pseudo instructions ending in ``TU`` are executed in tail undisturbed mode (see :ref:`masks and tails`).

They take an additional merge operand which is a vector whose elements should be preserved in the tail.

The last operand is SEW, which is encoded as ``5`` here. (``i32 = 2^5``)

The AVL and SEW operands aren't actually part of the ``vadd.vv`` instruction, but instead are used by the ``RISCVInsertVSETVLI.cpp`` pass to insert the necessary ``vsetvli`` instruction in front of it, after which the MIR looks like this:

.. code-block::

dead %3:gpr = PseudoVSETVLIX0 $x0, 209, implicit-def $vl, implicit-def $vtype

%2:vrm2 = PseudoVADD_VV_M2 %0:vrm2, %1:vrm2, -1, 5, implicit $vl, implicit $vtype

Now the physical ``$vl`` and ``$vtype`` registers are set up correctly after being implicitly defined by the ``VSETVLI``, after which they are then implicitly used by the ``VADD``.

See ``RISCVVType::encodeVTYPE`` for details on how ``vtype`` is encoded (``209`` in this example).

craig.topperUnsubmitted

Not Done

This file is also used VP intrinsics for scalable vectors.

craig.topper: This file is also used VP intrinsics for scalable vectors.

lukeAuthorUnsubmitted

Done

This is mentioned in the vector predication section below

luke: This is mentioned in the vector predication section below

.. note::

It is not necessary to emit a ``vsetvli`` instruction before every vector instruction if the current ``vl`` and ``vtype`` are still suitable for the intended vector operation, and ``RISCVInsertVSETVLI.cpp`` takes this into account:

It won't insert an instruction if neither ``vl`` nor ``vtype`` change.

After register allocation, the ``RISCVExpandPseudoInsts.cpp`` pass then expands out the ``PseudoVSETVLI``.

.. code-block::

dead $x10 = VSETVLI $x0, 209, implicit-def $vtype, implicit-def $vl

renamable $v8m2 = PseudoVADD_VV_M2 killed renamable $v8m2, killed renamable $v10m2, -1, 5, implicit $vl, implicit $vtype

Finally ``AsmPrinter`` lowers the pseudo instructions into real ``MCInsts``, discarding uneeded operands.

Note that the existing pseudo instruction remains until MCInst lowering.

See ``lowerRISCVVMachineInstrToMCInst`` to see how the pseudo instruction is matched up with the actual instruction.

.. code-block:: nasm

vsetvli a0, zero, e32,m2,ta,ma

vadd.vv v8, v8, v10

Fixed Length Vector Codegen

===========================

lukeAuthorUnsubmitted

Done

Am I correct in understanding that the main reason for the VL nodes is that it defers having to select an LMUL? Or is there another reason as to why they are used vs just selecting a pseudo instruction directly and using a constant for its VL operand

luke: Am I correct in understanding that the main reason for the `VL` nodes is that it defers having…

craig.topperUnsubmitted

Not Done

Selecting a pseudoinstruction from lowering would be make it difficult to do other optimizations without checking different combinations of pseudoinstruction opcodes.

We can't use the fixed vector types in the isel patterns because the mapping from fixed vector type to LMUL isn't static and isel patterns require the types to be explicitly mentioned in the patterns.

craig.topper: Selecting a pseudoinstruction from lowering would be make it difficult to do other…

As shown above, instruction selection works on scalable vectors, that is vectors with a type like ``<vscale x n x t>``.

So for fixed length vectors like ``<n x t>``, they need to be converted to scalable vectors first.

To assist with this, an intermediate layer of nodes that take an explicit ``VL`` operand is used.

The nodes and their patterns are defined in ``RISCVInstrInfoVVLPatterns.td``.

For example, for the following LLVM IR on a fixed-length vector of 4 elements:

.. code-block:: llvm

%x = add <4 x i32> %a, %b

The initial ISel DAG will look like this:

.. code-block::

t4: v4i32 = extract_subvector t2, Constant:i32<0>

t7: v4i32 = extract_subvector t6, Constant:i32<0>

t8: v4i32 = add t4, t7

But instead of being lowered to a ``PseudoVADD_VV``, it gets converted to a scalable vector and an ``ADD_VL`` SDNode is selected:

.. code-block::

t15: nxv2i1 = RISCVISD::VMSET_VL Constant:i32<4>

t16: nxv2i32 = RISCVISD::ADD_VL t2, t6, undef:nxv2i32, t15, Constant:i32<4>

These ``_VL`` suffixed nodes are counterparts to their pseudo instructions, but don't specify LMUL and are tagged with a ``VL`` operand, which is 4 here.

It will be later used by the pass inserting ``vsetvli`` so that it can statically set ``VL`` to the number of elements in the fixed-length vector.

.. note::

Because the ``vadd`` can be masked, the third operand on this VL node is a merge operand that is used for undisturbed semantics (otherwise set to ``undef`` in this example). This operand is tied to the destination. If it is an actual value it entails ``tu,mu`` (see :ref:`masks and tails`).

The following operand is a mask operand of type ``<n x i1>``, which is set by ``VMSET``.

``VMSET`` is a RISC-V pseudo instruction (not an LLVM pseudo instruction) that sets the destination register bits to all ones, so this is the equivalent of not using a mask.

Its operand is the AVL.

The final operand is the explicit ``VL``, of type ``XLenVT``.

It is then selected as the corresponding pseudo instruction with a suitable LMUL:

.. code-block::

t15: nxv2i1 = PseudoVMSET_M_B2 TargetConstant:i32<4>, TargetConstant:i32<0>

t22: ch,glue = CopyToReg t0, Register:nxv2i1 $v0, t15

t16: nxv2i32 = PseudoVADD_VV_M1_MASK undef:nxv2i32, t2, t6, Register:nxv2i1 $v0, TargetConstant:i32<4>, TargetConstant:i32<5>, TargetConstant:i32<1>, t22:1

During post-processing, ``RISCVDAGToDAGISel::doPeepholeMaskedRVV`` then detects that the mask in ``$v0`` is all ones and converts the masked form to the unmasked form:

.. code-block::

t24: nxv2i32 = PseudoVADD_VV_M1 t2, t6, TargetConstant:i32<4>, TargetConstant:i32<5>

Code generation then proceeds as normal as shown in :ref:`scalable vector codegen`.

Vector Predication instructions

===============================

Similarly to fixed-length vectors, vector predicate intrinsics are lowered to ``VL`` nodes first. So the use of the following ``@llvm.vp`` intrinsic

.. code-block:: llvm

%x = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i1> %m, i32 4)

Enters the DAG as a ``vp_add`` node:

.. code-block::

t10: nxv4i32 = vp_add t2, t4, t6, t8

Which ``RISCVTargetLowering::lowerVPOp`` then lowers into the corresponding ``VL`` node:

.. code-block::

t15: nxv4i32 = RISCVISD::ADD_VL t2, t4, undef:nxv4i32, t6, Constant:i32<4>

And subsequently the correpsonding masked pseudo instruction, where the mask is copied into ``$v0``:

.. code-block::

t6: nxv4i1,ch = CopyFromReg t0, Register:nxv4i1 %2

t20: ch,glue = CopyToReg t0, Register:nxv4i1 $v0, t6

t16: nxv4i32 = PseudoVADD_VV_M2_MASK IMPLICIT_DEF:nxv4i32, t2, t4, Register:nxv4i1 $v0, t8, TargetConstant:i32<5>, TargetConstant:i32<1>, t20:1

References

==========

.. [RVV] `RISC-V "V" Vector Extension <https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc>`_

.. [RVV-CodeGen-RFC] `[llvm-dev] [RFC] Code generation for RISC-V V-extension <https://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html>`_

.. [SVE-RFC] `[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths <https://lists.llvm.org/pipermail/llvm-dev/2018-July/124396.html>`_

llvm/docs/UserGuides.rst

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	.. toctree::
NewPassManager		NewPassManager
NVPTXUsage		NVPTXUsage
Phabricator		Phabricator
Passes		Passes
ReportingGuide		ReportingGuide
ResponseGuide		ResponseGuide
Remarks		Remarks
RISCVUsage		RISCVUsage
		RISCV/RISCVVectorExtension
SourceLevelDebugging		SourceLevelDebugging
SPIRVUsage		SPIRVUsage
StackSafetyAnalysis		StackSafetyAnalysis
SupportLibrary		SupportLibrary
TableGen/index		TableGen/index
TableGenFundamentals		TableGenFundamentals
Vectorizers		Vectorizers
WritingAnLLVMPass		WritingAnLLVMPass
▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines

:doc:`DirectXUsage`		:doc:`DirectXUsage`
This document describes using the DirectX target to compile GPU code for the		This document describes using the DirectX target to compile GPU code for the
DirectX runtime.		DirectX runtime.

:doc:`RISCVUsage`		:doc:`RISCVUsage`
This document describes using the RISCV-V target.		This document describes using the RISCV-V target.

		:doc:`RISCV/RISCVVectorExtension`
		This document describes how code is generated for the RISC-V Vector extension.

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][Docs] Document code generation for vector extensionNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 495299

llvm/docs/RISCV/RISCVVectorExtension.rst

llvm/docs/UserGuides.rst

[RISCV][Docs] Document code generation for vector extension
Needs ReviewPublic