diff --git a/llvm/docs/RISCV/RISCVVectorExtension.rst b/llvm/docs/RISCV/RISCVVectorExtension.rst
new file mode 100644
--- /dev/null
+++ b/llvm/docs/RISCV/RISCVVectorExtension.rst
@@ -0,0 +1,407 @@
+=========================
+ RISC-V Vector Extension
+=========================
+
+.. contents::
+   :local:
+
+The RISC-V Vector extension provides vector computation capabilities to the RISC-V architecture [RVV]_.
+
+This guide is based off the original RFC proposing code generation for the extension [RVV-CodeGen-RFC]_, and briefly outlines the features of the extension, as well as giving an overview of how the RISC-V backend generates code for it.
+
+Overview
+========
+
+The vector extension adds 32 vector registers ``v0``, ``v1``, ..., ``v31`` to the ISA.
+Unlike typical SIMD ISAs, the size in bits of each vector register is an implementation-specific parameter called ``VLEN`` and must be a power of two.
+``VLEN`` may also have additional constraints depending on the exact vector extension, see :ref:`standard vector extensions` for more details.
+
+Vector registers are partitioned (i.e. densely packed) in elements whose size in bits is a power of two, ranging from 8 to a maximum called ``ELEN``.
+``ELEN`` is also a power of two and :math:`\texttt{ELEN} \leq \texttt{VLEN}`.
+
+Due to encoding constraints, not all the operands of a vector operation are encoded in the instructions themselves.
+Two CSR (control and status registers) are used instead:
+
+- ``vl``: the number of elements being operated, called the vector length. A vector instruction will operate the elements ``0`` to ``vl-1``
+- ``vtype``: the vector type. This register encodes the element size of the operation, called the standard element width (SEW) and a vector grouping mechanism called the length multiplier (LMUL)
+
+
+Length multiplier
+-----------------
+
+The length multiplier (LMUL) can take values 1, 2, 4, 8, 1/2, 1/4, 1/8.
+It is encoded as a power of two, where :math:`\text{LMUL} = 2^k, -3 \leq k \leq 3`.
+
+- When :math:`\text{LMUL} = 1` the vector instructions operate on the (32) vector registers.
+- When :math:`\text{LMUL} \lt 1` the vector instructions operate on the lowest half, quarter or eighth of a vector register.
+- When :math:`\text{LMUL} \gt 1` the vector instructions operate on vector groups encoded in the instruction using the lowest numbered vector register of the group.
+  A vector group is the set of consecutive vector registers ``v{LMUL*i}``, ``v{LMUL*i+1}``, ... , ``v{LMUL*(i + 1) - 1}``. So
+
+  - :math:`\text{LMUL}=2` has 16 groups: ``v0``, ``v2``, ``v4``, ..., ``v28``, ``v30``
+  - :math:`\text{LMUL}=4` has 8 groups: ``v0``, ``v4``, ``v8``, ``v12``, ``v16``, ``v20``, ``v24``, ``v28``
+  - :math:`\text{LMUL}=8` has 4 groups: ``v0``, ``v8``, ``v16``, ``v24``
+
+For instance, under :math:`\text{LMUL}=4`, a vector group ``v4`` operand includes vector registers ``v4``, ``v5``, ``v6`` and ``v7`` as if they had been concatenated as a four times larger vector register.
+
+LMUL is useful to align the number of elements in vector codes whose element sizes are different (say when combining vectors of 32- and 64-bit elements) or when doing *widenings* (zero, sign or fp extensions) or *narrowings* (truncations).
+
+Setting ``vl`` and ``vtype``
+----------------------------
+
+A program must ensure that both ``vl`` and ``vtype`` have the correct values for a vector operation before executing a vector instruction.
+This is done using the ``vsetvli`` instruction.
+
+.. code-block:: nasm
+		
+  vsetvli rdest, rsrc, sew,lmul,tx,mx     # tx,mx is described in Masks and tails
+
+``rsrc`` is the application vector length (AVL) and will be used when setting the ``vl``. ``rdest`` is updated with the value of ``vl``.
+The spec allows some latitude here but a simple functional model of what ``vsetvli`` does is the following:
+
+.. math::
+   
+  \text{vl} &\gets \min(\text{rsrc}, \frac{\text{LMUL} \times \text{VLEN}}{\text{SEW}}) \\
+  \text{vtype} &\gets \text{SEW},\text{lmul},\dots
+
+There is also ``vsetivli`` for when the AVL is an immediate, and ``vsetvl`` for when the AVL and ``vtype`` are both registers.
+
+``vsetvli`` has a couple of special cases:
+
+- When ``rsrc`` is ``x0`` and ``rdest`` is not ``x0`` then :math:`\text{vl} \gets \text{lmul} \times \frac{\text{VLEN}}{\text{SEW}}`.
+  In other words, sets ``vl`` to be the maximum vector length for a given LMUL and SEW.
+  This is useful for whole-register operations.
+  
+  .. code-block:: nasm
+		
+     vsetvli t0, x0, e32,m2,ta,ma # vl ← 2*VLEN/64
+                                  # vtype ← e32,m2,…
+                                  # t0 ← vl
+
+- When ``rsrc`` and ``rdest`` are both ``x0`` (the hard-coded zero of RISC-V) then ``vl`` is used as the AVL. This can be used to change the ``vtype`` when we know the ratio :math:`\frac{\text{SEW}}{\text{LMUL}}` will be preserved.
+  
+  .. code-block:: nasm
+		
+     vsetvli x0, x0, e64,m4,ta,ma # changing vtype from e32,m2 to e64,m4 is OK (vl is unchanged)
+                                  # vtype ← e64,m4,…
+
+Two simple examples (register ``x10`` contains the AVL)
+
+- Add two 32-bit element vectors under :math:`\text{LMUL}=1`
+  
+  .. code-block:: nasm
+		
+     vsetvli x0, x10, e32,m1,ta,ma
+     vadd.vv v1, v2, v3  # v1[0:vl-1] ← v2[0:vl-1] + v3[0:vl-1]
+                         # where v[i:j] is all v[x] where i <= x <= j
+
+- Add two 64-bit element vectors under :math:`\text{LMUL}=2`
+  
+  .. code-block:: nasm
+		
+     vsetvli x0, x10, e64,m2,ta,ma
+     vadd.vv v2, v4, v6  # Updates v2 and v3. Reads v4, v5 and v6, v7
+                         # v2[0:x-1] ← v4[0:x-1] + v6[0:x-1] where x = min(VLEN/64, vl)
+                         # v3[0:y-1] ← v5[0:y-1] + v7[0:y-1] where y = vl - x
+
+.. note::			 
+
+   ``vsetvli`` is commonly used for stripmining, like in the example below:
+
+   .. code-block:: nasm
+ 
+      # on entry:
+      #  a0 holds the total number of elements
+      #  a1 holds the address of the source array
+      loop:
+          vsetvli t0, a0, e32,m8,ta,ma    # setup VL, LMUL=8
+          vle32.v v8, (a1)                # load elements
+          vadd.vi v8, v8, 1               # process elements
+          vse32.v v8, (a1)                # store updated elements
+          sub     a0, a0, t0              # decrement count
+          slli    t0, t0, 2               # increment address
+          add     a1, a1, t0
+          bnez    a0, loop                # loop until all processed
+
+  The way you would read the ``vsetvli`` is as follows:
+
+  - ``e32,m8``: Group the registers together into groups of 8 (:math:`\text{LMUL}=8`) and partition them into 32-bit elements.
+  - ``ta,mu``: Be tail agnostic and mask agnostic: We don't care about what's in the elements that aren't processed.
+  - ``a0``: Try and process ``a0`` elements, or as many as the hardware supports.
+  - ``t0``: Store ``vl``, i.e. the number of elements that will be processed this iteration
+
+.. _masks and tails:
+
+Masks and tails
+---------------
+The RISC-V Vector extension supports masks in almost all of its instructions.
+There are no distinguished mask registers, instead vector registers can be used to represent masks.
+
+However an instruction whose execution is masked can only use the ``v0`` register as the mask operand.
+Elements of the destination register that are masked off by the mask are called *inactive elements* (i.e. masked-off) 
+
+A vector instruction can be executed under a ``vl`` setting where :math:`\texttt{vl} \lt \text{LMUL} \times \frac{\texttt{VLEN}}{\text{SEW}}`.
+Elements of the destination register past the current ``vl`` are called the tail elements.
+
+There are two modes for the tail and inactive elements
+
+- undisturbed, in which the element of the destination register is left unmodified
+- agnostic, in which the elements of the destination register is either left unmodified or all its bits set to 1 (for debugging purposes). In this mode we cannot assume anything about the bits of those elements
+
+``tx,mx`` in ``vsetvli`` above correspond to these two policies and can be combined in 4 ways:
+
+- ``tu,mu``: Both tail and inactive are left undisturbed
+- ``ta,ma``: Both tail and inactive are agnostic
+- ``tu,ma``: Tail is left undisturbed and inactive are agnostic
+- ``ta,mu``: Tail is agnostic and inactive are left undisturbed.
+
+  
+
+.. _standard vector extensions:
+
+Standard vector extensions
+--------------------------
+   
+Formally, the vector extension exists in multiple variants, each of which imposes additional constraints on ``VLEN`` and ``EEW`` (the effective ``SEW`` for a specific vector operand):
+
+``Zvl*``
+   Extensions of the form ``Zvl32b``, ``Xvl64b``, etc.
+   These don't actually contain any instructions but just dictate the minimum required ``VLEN``.
+   All the extensions below require one of the ``Zvl`` extensions.
+
+``Zve``
+   A smaller subset of the vector extension designed for use in embedded devices.
+   Specifies a minimum ``VLEN`` and the range of supported ``EEW``s.
+   For example, ``Zve32x`` requires ``Zvl32b`` and supports ``EEW = {8, 16, 32}``.
+   ``Zve64f`` requires ``Zvl64b``, supports ``EEW = {8, 16, 32, 64}`` and also provides 32-bit floating point instructions.
+
+``v``
+   This is the single letter version of the vector extension intended for use in application contexts.
+   It requires ``Zvl128b`` as well as the ``f`` and ``d`` extensions, and provides all the instructions defined in the specification.
+
+
+Mapping to LLVM IR Types
+========================
+
+Since ``VLEN`` is an unknown constant from the compiler's perspective, the RISC-V backend takes the same approach as AArch64's SVE and uses scalable vector types [SVE-RFC]_.
+
+Scalable vector types are of the form ``<vscale x n x ty>``, which indicate a vector with a multiple of ``n`` elements of type ``ty``.
+LLVM supports only ``ELEN=32`` or ``ELEN=64``, so ``vscale`` is defined as ``VLEN/64``.
+This makes the LLVM IR types stable between the two ``ELEN`` s considered, i.e. every LLVM IR scalable vector type has exactly one corresponding pair of element type and LMUL, and vice-versa.
+
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+|                   | LMUL=⅛        | LMUL=¼         | LMUL=½           | LMUL=1            | LMUL=2            | LMUL=4            | LMUL=8            |
++===================+===============+================+==================+===================+===================+===================+===================+
+| i64 (ELEN=64)     | N/A           | N/A            | N/A              | <v x 1 x i64>     | <v x 2 x i64>     | <v x 4 x i64>     | <v x 8 x i64>     |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i32               | N/A           | N/A            | <v x 1 x i32>    | <v x 2 x i32>     | <v x 4 x i32>     | <v x 8 x i32>     | <v x 16 x i32>    |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i16               | N/A           | <v x 1 x i16>  | <v x 2 x i16>    | <v x 4 x i16>     | <v x 8 x i16>     | <v x 16 x i16>    | <v x 32 x i16>    |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| i8                | <v x 1 x i8>  | <v x 2 x i8>   | <v x 4 x i8>     | <v x 8 x i8>      | <v x 16 x i8>     | <v x 32 x i8>     | <v x 64 x i8>     |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| double (ELEN=64)  | N/A           | N/A            | N/A              | <v x 1 x double>  | <v x 2 x double>  | <v x 4 x double>  | <v x 8 x double>  |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| float             | N/A           | N/A            | <v x 1 x float>  | <v x 2 x float>   | <v x 4 x float>   | <v x 8 x float>   | <v x 16 x float>  |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+| half              | N/A           | <v x 1 x half> | <v x 2 x half>   | <v x 4 x half>    | <v x 8 x half>    | <v x 16 x half>   | <v x 32 x half>   |
++-------------------+---------------+----------------+------------------+-------------------+-------------------+-------------------+-------------------+
+
+(Read ``<v x k x ty>`` as ``<vscale x k x ty>``)
+
+One downside of this design is that it doesn’t allow vectors of i128 (this is, ELEN=128).
+In that case vscale would have to be 1/2 under :math:`\text{LMUL}=1`.
+This type (and its fp counterpart float128) are not that common and in case of extreme necessity types for :math:`\text{LMUL}=2` could be used instead.
+
+Additionally, this design prevents us from being able to compute a value for ``vscale`` when ``VLEN=32``.
+
+Mask vector types
+-----------------
+
+As for mask vectors, they are physically represented using a layout of densely packed bits in a vector register.
+They are mapped to the following LLVM IR types:
+
+- <vscale x 1 x i1>
+- <vscale x 2 x i1>
+- <vscale x 4 x i1>
+- <vscale x 8 x i1>
+- <vscale x 16 x i1>
+- <vscale x 32 x i1>
+- <vscale x 64 x i1>
+
+Two types with the same ratio SEW/LMUL will have the same related mask type. For instance, two different comparisons one under SEW=64, LMUL=2 and the other under SEW=32, LMUL=1 will both generate a mask <vscale x 2 x i1>.
+
+Register classes
+================
+
+There are four register classes for vectors:
+
+- ``VR`` for vector registers (``v0``, ``v1,``, ..., ``v32``). Used when :math:`\text{LMUL} \leq 1` and mask registers.
+- ``VRM2`` for vector groups of length 2 i.e. :math:`\text{LMUL}=2` (``v0m2``, ``v2m2``, ..., ``v30m2``)
+- ``VRM4`` for vector groups of length 4 i.e. :math:`\text{LMUL}=4` (``v0m4``, ``v4m4``, ..., ``v28m4``)
+- ``VRM8`` for vector groups of length 8 i.e. :math:`\text{LMUL}=8` (``v0m8``, ``v8m8``, ..., ``v24m8``)
+
+:math:`\text{LMUL} \lt 1` types and mask types do not benefit from having a dedicated class, so ``VR`` is used in their case.
+      
+.. _scalable vector codegen:
+
+Scalable Vector Codegen
+=======================
+
+Let's consider a very simple case using a whole-register op (this example uses :math:`\text{LMUL}=2`)
+
+.. code-block:: llvm
+		
+   %c = add <vscale x 4 x i32> %a, %b
+
+From the above we get the following ISel DAG:
+
+.. code-block::
+   
+  t5: nxv4i32 = add t2, t4
+
+Which then gets selected as a pseudo instruction:
+
+.. code-block::
+   
+  t6: nxv4i32 = PseudoVADD_VV_M2 t2, t4, TargetConstant:i32<-1>, TargetConstant:i32<5>
+
+Each vector instruction has multiple pseudo instructions defined in ``RISCVInstrInfoVPseudos.td``, with their patterns defined in ``RISCVInstrInfoVSDPatterns.td``.
+For example, ``VADD_VV`` has pseudo instructions for ``PseudoVADD_VV_M1``, ``PseudoVADD_VV_M2``, and so on.
+
+The ``M2``  suffix means that we're operating on groups of :math:`\text{LMUL}=2`, and the ``VV`` suffix means we're doing a vector-vector operation (i.e. ``vadd.vv``).
+Other suffixes include ``VX`` for vector-scalar and ``VI`` for vector-immediate.
+
+The first two operands ``t2`` and ``t4`` to the pseudo instruction are the inputs to the regular ``VADD_VV`` instruction, ``vs1`` and ``vs2`` respectively.
+
+The third is the AVL, i.e. how many elements do we want to operate on, and is of type ``XLenVT``. It's set to -1 here because we want to operate on all the elements.
+
+.. note::
+   
+   Pseudo instructions ending in ``TU`` are executed in tail undisturbed mode (see :ref:`masks and tails`).
+   They take an additional merge operand which is a vector whose elements should be preserved in the tail.
+
+The last operand is SEW, which is encoded as ``5`` here. (``i32 = 2^5``)
+
+The AVL and SEW operands aren't actually part of the ``vadd.vv`` instruction, but instead are used by the ``RISCVInsertVSETVLI.cpp`` pass to insert the necessary ``vsetvli`` instruction in front of it, after which the MIR looks like this:
+
+.. code-block::
+   
+  dead %3:gpr = PseudoVSETVLIX0 $x0, 209, implicit-def $vl, implicit-def $vtype
+  %2:vrm2 = PseudoVADD_VV_M2 %0:vrm2, %1:vrm2, -1, 5, implicit $vl, implicit $vtype
+
+Now the physical ``$vl`` and ``$vtype`` registers are set up correctly after being implicitly defined by the ``VSETVLI``, after which they are then implicitly used by the ``VADD``.
+See ``RISCVVType::encodeVTYPE`` for details on how ``vtype`` is encoded (``209`` in this example).
+
+.. note::			 
+   It is not necessary to emit a ``vsetvli`` instruction before every vector instruction if the current ``vl`` and ``vtype`` are still suitable for the intended vector operation, and ``RISCVInsertVSETVLI.cpp`` takes this into account:
+   It won't insert an instruction if neither ``vl`` nor ``vtype`` change.
+
+After register allocation, the ``RISCVExpandPseudoInsts.cpp`` pass then expands out the ``PseudoVSETVLI``.
+
+.. code-block::
+   
+   dead $x10 = VSETVLI $x0, 209, implicit-def $vtype, implicit-def $vl
+   renamable $v8m2 = PseudoVADD_VV_M2 killed renamable $v8m2, killed renamable $v10m2, -1, 5, implicit $vl, implicit $vtype
+
+Finally ``AsmPrinter`` lowers the pseudo instructions into real ``MCInsts``, discarding uneeded operands.
+Note that the existing pseudo instruction remains until MCInst lowering.
+See ``lowerRISCVVMachineInstrToMCInst`` to see how the pseudo instruction is matched up with the actual instruction.
+
+.. code-block:: nasm
+   
+   vsetvli a0, zero, e32,m2,ta,ma
+   vadd.vv v8, v8, v10
+
+Fixed Length Vector Codegen
+===========================
+
+As shown above, instruction selection works on scalable vectors, that is vectors with a type like ``<vscale x n x t>``.
+So for fixed length vectors like ``<n x t>``, they need to be converted to scalable vectors first.
+To assist with this, an intermediate layer of nodes that take an explicit ``VL`` operand is used.
+The nodes and their patterns are defined in ``RISCVInstrInfoVVLPatterns.td``.
+
+For example, for the following LLVM IR on a fixed-length vector of 4 elements:
+
+.. code-block:: llvm
+		
+   %x = add <4 x i32> %a, %b
+
+The initial ISel DAG will look like this:
+
+.. code-block::
+   
+     t4: v4i32 = extract_subvector t2, Constant:i32<0>
+     t7: v4i32 = extract_subvector t6, Constant:i32<0>
+   t8: v4i32 = add t4, t7
+
+But instead of being lowered to a ``PseudoVADD_VV``, it gets converted to a scalable vector and an ``ADD_VL`` SDNode is selected:
+
+.. code-block::
+   
+   t15: nxv2i1 = RISCVISD::VMSET_VL Constant:i32<4>
+        t16: nxv2i32 = RISCVISD::ADD_VL t2, t6, undef:nxv2i32, t15, Constant:i32<4>
+
+These ``_VL`` suffixed nodes are counterparts to their pseudo instructions, but don't specify LMUL and are tagged with a ``VL`` operand, which is 4 here.
+It will be later used by the pass inserting ``vsetvli`` so that it can statically set ``VL`` to the number of elements in the fixed-length vector.
+
+.. note::
+   
+  Because the ``vadd`` can be masked, the third operand on this VL node is a merge operand that is used for undisturbed semantics (otherwise set to ``undef`` in this example). This operand is tied to the destination. If it is an actual value it entails ``tu,mu`` (see :ref:`masks and tails`).
+
+  The following operand is a mask operand of type ``<n x i1>``, which is set by ``VMSET``.
+  ``VMSET`` is a RISC-V pseudo instruction (not an LLVM pseudo instruction) that sets the destination register bits to all ones, so this is the equivalent of not using a mask.
+  Its operand is the AVL.
+  
+  The final operand is the explicit ``VL``, of type ``XLenVT``.
+
+It is then selected as the corresponding pseudo instruction with a suitable LMUL:
+
+.. code-block::
+
+       t15: nxv2i1 = PseudoVMSET_M_B2 TargetConstant:i32<4>, TargetConstant:i32<0>
+     t22: ch,glue = CopyToReg t0, Register:nxv2i1 $v0, t15
+   t16: nxv2i32 = PseudoVADD_VV_M1_MASK undef:nxv2i32, t2, t6, Register:nxv2i1 $v0, TargetConstant:i32<4>, TargetConstant:i32<5>, TargetConstant:i32<1>, t22:1
+
+During post-processing, ``RISCVDAGToDAGISel::doPeepholeMaskedRVV`` then detects that the mask in ``$v0`` is all ones and converts the masked form to the unmasked form:
+
+.. code-block::
+
+  t24: nxv2i32 = PseudoVADD_VV_M1 t2, t6, TargetConstant:i32<4>, TargetConstant:i32<5>		
+
+Code generation then proceeds as normal as shown in :ref:`scalable vector codegen`.
+
+Vector Predication instructions
+===============================
+
+Similarly to fixed-length vectors, vector predicate intrinsics are lowered to ``VL`` nodes first. So the use of the following ``@llvm.vp`` intrinsic
+
+.. code-block:: llvm
+		
+   %x = call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %a, <vscale x 4 x i32> %b, <vscale x 4 x i1> %m, i32 4)
+
+Enters the DAG as a ``vp_add`` node:
+
+.. code-block::
+
+   t10: nxv4i32 = vp_add t2, t4, t6, t8
+
+Which ``RISCVTargetLowering::lowerVPOp`` then lowers into the corresponding ``VL`` node:
+
+.. code-block::
+   
+   t15: nxv4i32 = RISCVISD::ADD_VL t2, t4, undef:nxv4i32, t6, Constant:i32<4>
+
+And subsequently the correpsonding masked pseudo instruction, where the mask is copied into ``$v0``:
+
+.. code-block::
+
+       t6: nxv4i1,ch = CopyFromReg t0, Register:nxv4i1 %2
+     t20: ch,glue = CopyToReg t0, Register:nxv4i1 $v0, t6
+   t16: nxv4i32 = PseudoVADD_VV_M2_MASK IMPLICIT_DEF:nxv4i32, t2, t4, Register:nxv4i1 $v0, t8, TargetConstant:i32<5>, TargetConstant:i32<1>, t20:1
+
+References
+==========
+
+.. [RVV] `RISC-V "V" Vector Extension <https://github.com/riscv/riscv-v-spec/blob/v1.0/v-spec.adoc>`_
+.. [RVV-CodeGen-RFC] `[llvm-dev] [RFC] Code generation for RISC-V V-extension <https://lists.llvm.org/pipermail/llvm-dev/2020-October/145850.html>`_
+.. [SVE-RFC] `[RFC][SVE] Supporting SIMD instruction sets with variable vector lengths <https://lists.llvm.org/pipermail/llvm-dev/2018-July/124396.html>`_
diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst
--- a/llvm/docs/UserGuides.rst
+++ b/llvm/docs/UserGuides.rst
@@ -59,6 +59,7 @@
    ResponseGuide
    Remarks
    RISCVUsage
+   RISCV/RISCVVectorExtension
    SourceLevelDebugging
    SPIRVUsage
    StackSafetyAnalysis
@@ -261,3 +262,5 @@
 :doc:`RISCVUsage`
    This document describes using the RISCV-V target.
 
+:doc:`RISCV/RISCVVectorExtension`
+   This document describes how code is generated for the RISC-V Vector extension.