This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/
-
CodeGen/SelectionDAG/
-
SelectionDAG/
-
LegalizeVectorTypes.cpp
-
Target/RISCV/
-
RISCV/
-
RISCVISelLowering.cpp
-
test/CodeGen/RISCV/rvv/
-
CodeGen/
-
RISCV/
-
rvv/
-
zve32-types.ll

Differential D128286

[RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.
ClosedPublic

Authored by craig.topper on Jun 21 2022, 9:44 AM.

Download Raw Diff

Details

Reviewers

reames
frasercrmck
rogfer01
kito-cheng
arcbbb
fakepaper56

Commits

rG8b10ffabae48: [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f.

Summary

According to the vector spec, mf8 is not supported for i8 if ELEN
is 32. Similarily mf4 is not suported for i16/f16 or mf2 for i32/f32.

Since RVVBitsPerBlock is 64 and LMUL is calculated as
((MinNumElements * ElementSize) / RVVBitsPerBlock) this means we
need to disable any type with MinNumElements==1.

For generic IR, these types will now be widened in type legalization.
For RVV intrinsics, we'll probably hit a fatal error somewhere. I plan
to work on disabling the intrinsics in the riscv_vector.h header.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,110 ms	x64 debian > AddressSanitizer-x86_64-linux-dynamic.TestCases::scariness_score_test.cpp
	60,130 ms	x64 debian > AddressSanitizer-x86_64-linux.TestCases::scariness_score_test.cpp
	60,030 ms	x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test
	60,020 ms	x64 debian > libFuzzer.libFuzzer::minimize_crash.test
	60,010 ms	x64 debian > libFuzzer.libFuzzer::value-profile-load.test

Event Timeline

craig.topper created this revision.Jun 21 2022, 9:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 9:44 AM

Herald added subscribers: sunshaoce, VincentWu, luke957 and 27 others. · View Herald Transcript

craig.topper requested review of this revision.Jun 21 2022, 9:44 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 21 2022, 9:44 AM

Herald added subscribers: • pcwang-thead, eopXD, MaskRay. · View Herald Transcript

Add basic sanity test that type legalization will at least try to handle these types.

Harbormaster completed remote builds in B171131: Diff 438752.Jun 21 2022, 11:39 AM

It's great that MinElts can be obtained from RISCV::RVVBitsPerBlock / Subtarget.getELEN()
LGTM. Thanks!

This revision is now accepted and ready to land.Jun 23 2022, 1:12 AM

Warning, I may be miss understanding the problem you're solving here, but...

The case you mention appear to be specific to when VLEN=32 right? If so, a cleaner way of phrasing the illegal cases would seem to be to compute the effective vector length after LMUL (i.e. VLEN/8 for mf8), and then disallow any case where the implied vector length is shorter than a single element of the element type.

A case to consider, what happens if VLEN=64? Should we be disallowing e.g. mf2 e64? (Seems like we should be right?) If so, can we approach the problem the same?

The use of RVVBitsPerBlock feels like a red-herring here. In particular, it's not clear to me why the value remains 64 on a VLEN=32 configuration.

Anyways, purely non-blocking comment. Not sure if I've actually understood this or not. :)

This revision was landed with ongoing or failed builds.Jun 23 2022, 8:49 AM

Closed by commit rG8b10ffabae48: [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f. (authored by craig.topper). · Explain Why

This revision was automatically updated to reflect the committed changes.

craig.topper added a commit: rG8b10ffabae48: [RISCV] Disable <vscale x 1 x *> types with Zve32x or Zve32f..

In D128286#3605390, @reames wrote:

Warning, I may be miss understanding the problem you're solving here, but...

The case you mention appear to be specific to when VLEN=32 right? If so, a cleaner way of phrasing the illegal cases would seem to be to compute the effective vector length after LMUL (i.e. VLEN/8 for mf8), and then disallow any case where the implied vector length is shorter than a single element of the element type.

The problem is still valid for VLEN>=64 ELEN=32. While you are correct that a SEW=8 LMUL=1/8 would fit in a register for that config, there is no requirement in the spec for hardware to support it. The spec says that that the smallest fractional LMUL is SEWMIN/ELEN. Which would be 8/32 or 1/4 in this config. The spec goes on to say "For a given supported fractional LMUL setting, implementations must support SEW settings between SEWMIN and LMUL * ELEN, inclusive." So if ELEN is 32, LMUL=1/4 is only required to support SEW=8 and LMUL=1/2 is only required to support SEW=8 or 16.

A case to consider, what happens if VLEN=64? Should we be disallowing e.g. mf2 e64? (Seems like we should be right?) If so, can we approach the problem the same?

mf2 e64 was already implicitly disabled. 1 x vscale x i64 is LMUL=1.

The use of RVVBitsPerBlock feels like a red-herring here. In particular, it's not clear to me why the value remains 64 on a VLEN=32 configuration.

You are correct RVVBitsPerBlock should change with Zve32 in order to support VLEN=32. Unfortunately, it would change the type mapping from vscale to LMUL and require substantial changes to the tablegen patterns. Naively I think it would roughly double the size of the isel table. Since we use MVT types to pick instructions register classes, we would need 2 sets of patterns.

If we did change RVVBitsPerBlock for Zve32 the problem I'm trying to fix here would go away.

In D128286#3605455, @craig.topper wrote:

In D128286#3605390, @reames wrote:

Warning, I may be miss understanding the problem you're solving here, but...

The case you mention appear to be specific to when VLEN=32 right? If so, a cleaner way of phrasing the illegal cases would seem to be to compute the effective vector length after LMUL (i.e. VLEN/8 for mf8), and then disallow any case where the implied vector length is shorter than a single element of the element type.

The problem is still valid for VLEN>=64 ELEN=32. While you are correct that a SEW=8 LMUL=1/8 would fit in a register for that config, there is no requirement in the spec for hardware to support it. The spec says that that the smallest fractional LMUL is SEWMIN/ELEN. Which would be 8/32 or 1/4 in this config. The spec goes on to say "For a given supported fractional LMUL setting, implementations must support SEW settings between SEWMIN and LMUL * ELEN, inclusive." So if ELEN is 32, LMUL=1/4 is only required to support SEW=8 and LMUL=1/2 is only required to support SEW=8 or 16.

I think I can still phrase that in the "does this fit" manner. I just need to introduce a MINVLEN (which is simply ELEN), and ask whether an implied vector length for a given LMUL contains a least one element.

In D128286#3605512, @reames wrote:

In D128286#3605455, @craig.topper wrote:

In D128286#3605390, @reames wrote:

Warning, I may be miss understanding the problem you're solving here, but...

The case you mention appear to be specific to when VLEN=32 right? If so, a cleaner way of phrasing the illegal cases would seem to be to compute the effective vector length after LMUL (i.e. VLEN/8 for mf8), and then disallow any case where the implied vector length is shorter than a single element of the element type.

The problem is still valid for VLEN>=64 ELEN=32. While you are correct that a SEW=8 LMUL=1/8 would fit in a register for that config, there is no requirement in the spec for hardware to support it. The spec says that that the smallest fractional LMUL is SEWMIN/ELEN. Which would be 8/32 or 1/4 in this config. The spec goes on to say "For a given supported fractional LMUL setting, implementations must support SEW settings between SEWMIN and LMUL * ELEN, inclusive." So if ELEN is 32, LMUL=1/4 is only required to support SEW=8 and LMUL=1/2 is only required to support SEW=8 or 16.

I think I can still phrase that in the "does this fit" manner. I just need to introduce a MINVLEN (which is simply ELEN), and ask whether an implied vector length for a given LMUL contains a least one element.

Let's run through the math.

vscale is defined as (VLEN/RVVBitsPerBlock)
So a <vscale x 1 x i8> type is ((VLEN/RVVBitsPerBlock) * 1 * 8 bits).

Let's replace VLEN with MINVLEN(or ELEN).
size = ((ELEN/RVVBitsPerBlock) * 1 * 8 bits).

If we want to know how many 8 bit elements that holds we get
minnumelts = ((ELEN/RVVBitsPerBlock) * 1.

For ELEN = 32 and RVVBitsPerBlock=64 that value is less than 1.

We want to exclude all types <vscale x y x i8> that can't hold a whole element. Or types where ((ELEN/RVVBitsPerBlock) * y) < 1 is true.

This is true when y < (RVVBitsPerBlock / ELEN).

If RVVBitsPerBlock followed ELEN, the definition of vscale would also follow ELEN and this would go away.

luke mentioned this in D142348: [RISCV][Docs] Document code generation for vector extension.Feb 6 2023, 3:02 PM

Revision Contents

Path

Size

llvm/

lib/

CodeGen/

SelectionDAG/

LegalizeVectorTypes.cpp

5 lines

Target/

RISCV/

RISCVISelLowering.cpp

40 lines

test/

CodeGen/

RISCV/

rvv/

zve32-types.ll

94 lines

Diff 438752

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp

	//===------- LegalizeVectorTypes.cpp - Legalization of vector types -------===//			//===------- LegalizeVectorTypes.cpp - Legalization of vector types -------===//
				Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	▲ Show 20 Lines • Show All 4,001 Lines • ▼ Show 20 Lines

	SDValue DAGTypeLegalizer::WidenVecRes_BinaryCanTrap(SDNode *N) {			SDValue DAGTypeLegalizer::WidenVecRes_BinaryCanTrap(SDNode *N) {
	// Binary op widening for operations that can trap.			// Binary op widening for operations that can trap.
	unsigned Opcode = N->getOpcode();			unsigned Opcode = N->getOpcode();
	SDLoc dl(N);			SDLoc dl(N);
	EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));			EVT WidenVT = TLI.getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
	EVT WidenEltVT = WidenVT.getVectorElementType();			EVT WidenEltVT = WidenVT.getVectorElementType();
	EVT VT = WidenVT;			EVT VT = WidenVT;
	unsigned NumElts = VT.getVectorNumElements();			unsigned NumElts = VT.getVectorMinNumElements();
	const SDNodeFlags Flags = N->getFlags();			const SDNodeFlags Flags = N->getFlags();
	while (!TLI.isTypeLegal(VT) && NumElts != 1) {			while (!TLI.isTypeLegal(VT) && NumElts != 1) {
	NumElts = NumElts / 2;			NumElts = NumElts / 2;
	VT = EVT::getVectorVT(*DAG.getContext(), WidenEltVT, NumElts);			VT = EVT::getVectorVT(*DAG.getContext(), WidenEltVT, NumElts);
	}			}

	if (NumElts != 1 && !TLI.canOpTrap(N->getOpcode(), VT)) {			if (NumElts != 1 && !TLI.canOpTrap(N->getOpcode(), VT)) {
	// Operation doesn't trap so just widen as normal.			// Operation doesn't trap so just widen as normal.
	SDValue InOp1 = GetWidenedVector(N->getOperand(0));			SDValue InOp1 = GetWidenedVector(N->getOperand(0));
	SDValue InOp2 = GetWidenedVector(N->getOperand(1));			SDValue InOp2 = GetWidenedVector(N->getOperand(1));
	return DAG.getNode(N->getOpcode(), dl, WidenVT, InOp1, InOp2, Flags);			return DAG.getNode(N->getOpcode(), dl, WidenVT, InOp1, InOp2, Flags);
	}			}

				// FIXME: Improve support for scalable vectors.
				assert(!VT.isScalableVector() && "Scalable vectors not handled yet.");

	// No legal vector version so unroll the vector operation and then widen.			// No legal vector version so unroll the vector operation and then widen.
	if (NumElts == 1)			if (NumElts == 1)
	return DAG.UnrollVectorOp(N, WidenVT.getVectorNumElements());			return DAG.UnrollVectorOp(N, WidenVT.getVectorNumElements());

	// Since the operation can trap, apply operation on the original vector.			// Since the operation can trap, apply operation on the original vector.
	EVT MaxVT = VT;			EVT MaxVT = VT;
	SDValue InOp1 = GetWidenedVector(N->getOperand(0));			SDValue InOp1 = GetWidenedVector(N->getOperand(0));
	SDValue InOp2 = GetWidenedVector(N->getOperand(1));			SDValue InOp2 = GetWidenedVector(N->getOperand(1));
	▲ Show 20 Lines • Show All 2,641 Lines • Show Last 20 Lines

llvm/lib/Target/RISCV/RISCVISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

//===-- RISCVISelLowering.cpp - RISCV DAG Lowering Implementation --------===//		//===-- RISCVISelLowering.cpp - RISCV DAG Lowering Implementation --------===//
		Lint: Lint Inline Actions clang-format not found in user’s local PATH; not linting file. Lint: Lint: clang-format not found in user’s local PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
▲ Show 20 Lines • Show All 98 Lines • ▼ Show 20 Lines	static const MVT::SimpleValueType F16VecVTs[] = {
MVT::nxv8f16, MVT::nxv16f16, MVT::nxv32f16};		MVT::nxv8f16, MVT::nxv16f16, MVT::nxv32f16};
static const MVT::SimpleValueType F32VecVTs[] = {		static const MVT::SimpleValueType F32VecVTs[] = {
MVT::nxv1f32, MVT::nxv2f32, MVT::nxv4f32, MVT::nxv8f32, MVT::nxv16f32};		MVT::nxv1f32, MVT::nxv2f32, MVT::nxv4f32, MVT::nxv8f32, MVT::nxv16f32};
static const MVT::SimpleValueType F64VecVTs[] = {		static const MVT::SimpleValueType F64VecVTs[] = {
MVT::nxv1f64, MVT::nxv2f64, MVT::nxv4f64, MVT::nxv8f64};		MVT::nxv1f64, MVT::nxv2f64, MVT::nxv4f64, MVT::nxv8f64};

if (Subtarget.hasVInstructions()) {		if (Subtarget.hasVInstructions()) {
auto addRegClassForRVV = [this](MVT VT) {		auto addRegClassForRVV = [this](MVT VT) {
		// Disable the smallest fractional LMUL types if ELEN is less than
		// RVVBitsPerBlock.
		unsigned MinElts = RISCV::RVVBitsPerBlock / Subtarget.getELEN();
		if (VT.getVectorMinNumElements() < MinElts)
		return;

unsigned Size = VT.getSizeInBits().getKnownMinValue();		unsigned Size = VT.getSizeInBits().getKnownMinValue();
const TargetRegisterClass *RC;		const TargetRegisterClass *RC;
if (Size <= RISCV::RVVBitsPerBlock)		if (Size <= RISCV::RVVBitsPerBlock)
RC = &RISCV::VRRegClass;		RC = &RISCV::VRRegClass;
else if (Size == 2 * RISCV::RVVBitsPerBlock)		else if (Size == 2 * RISCV::RVVBitsPerBlock)
RC = &RISCV::VRM2RegClass;		RC = &RISCV::VRM2RegClass;
else if (Size == 4 * RISCV::RVVBitsPerBlock)		else if (Size == 4 * RISCV::RVVBitsPerBlock)
RC = &RISCV::VRM4RegClass;		RC = &RISCV::VRM4RegClass;
▲ Show 20 Lines • Show All 344 Lines • ▼ Show 20 Lines	if (!Subtarget.is64Bit()) {
setOperationAction({ISD::VP_REDUCE_ADD, ISD::VP_REDUCE_AND,		setOperationAction({ISD::VP_REDUCE_ADD, ISD::VP_REDUCE_AND,
ISD::VP_REDUCE_OR, ISD::VP_REDUCE_XOR,		ISD::VP_REDUCE_OR, ISD::VP_REDUCE_XOR,
ISD::VP_REDUCE_SMAX, ISD::VP_REDUCE_SMIN,		ISD::VP_REDUCE_SMAX, ISD::VP_REDUCE_SMIN,
ISD::VP_REDUCE_UMAX, ISD::VP_REDUCE_UMIN},		ISD::VP_REDUCE_UMAX, ISD::VP_REDUCE_UMIN},
MVT::i64, Custom);		MVT::i64, Custom);
}		}

for (MVT VT : BoolVecVTs) {		for (MVT VT : BoolVecVTs) {
		if (!isTypeLegal(VT))
		continue;

setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR, VT, Custom);

// Mask VTs are custom-expanded into a series of standard nodes		// Mask VTs are custom-expanded into a series of standard nodes
setOperationAction({ISD::TRUNCATE, ISD::CONCAT_VECTORS,		setOperationAction({ISD::TRUNCATE, ISD::CONCAT_VECTORS,
ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},		ISD::INSERT_SUBVECTOR, ISD::EXTRACT_SUBVECTOR},
VT, Custom);		VT, Custom);

setOperationAction({ISD::INSERT_VECTOR_ELT, ISD::EXTRACT_VECTOR_ELT}, VT,		setOperationAction({ISD::INSERT_VECTOR_ELT, ISD::EXTRACT_VECTOR_ELT}, VT,
Show All 31 Lines	for (MVT VT : BoolVecVTs) {
}		}

setOperationAction(		setOperationAction(
{ISD::VP_FPTOSI, ISD::VP_FPTOUI, ISD::VP_TRUNCATE, ISD::VP_SETCC}, VT,		{ISD::VP_FPTOSI, ISD::VP_FPTOUI, ISD::VP_TRUNCATE, ISD::VP_SETCC}, VT,
Custom);		Custom);
}		}

for (MVT VT : IntVecVTs) {		for (MVT VT : IntVecVTs) {
if (VT.getVectorElementType() == MVT::i64 &&		if (!isTypeLegal(VT))
!Subtarget.hasVInstructionsI64())
continue;		continue;

setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);		setOperationAction(ISD::SPLAT_VECTOR, VT, Legal);
setOperationAction(ISD::SPLAT_VECTOR_PARTS, VT, Custom);		setOperationAction(ISD::SPLAT_VECTOR_PARTS, VT, Custom);

// Vectors implement MULHS/MULHU.		// Vectors implement MULHS/MULHU.
setOperationAction({ISD::SMUL_LOHI, ISD::UMUL_LOHI}, VT, Expand);		setOperationAction({ISD::SMUL_LOHI, ISD::UMUL_LOHI}, VT, Expand);

▲ Show 20 Lines • Show All 167 Lines • ▼ Show 20 Lines	if (Subtarget.hasVInstructions()) {
const auto SetCommonVFPExtLoadTruncStoreActions =		const auto SetCommonVFPExtLoadTruncStoreActions =
[&](MVT VT, ArrayRef<MVT::SimpleValueType> SmallerVTs) {		[&](MVT VT, ArrayRef<MVT::SimpleValueType> SmallerVTs) {
for (auto SmallVT : SmallerVTs) {		for (auto SmallVT : SmallerVTs) {
setTruncStoreAction(VT, SmallVT, Expand);		setTruncStoreAction(VT, SmallVT, Expand);
setLoadExtAction(ISD::EXTLOAD, VT, SmallVT, Expand);		setLoadExtAction(ISD::EXTLOAD, VT, SmallVT, Expand);
}		}
};		};

if (Subtarget.hasVInstructionsF16())		if (Subtarget.hasVInstructionsF16()) {
for (MVT VT : F16VecVTs)		for (MVT VT : F16VecVTs) {
		if (!isTypeLegal(VT))
		continue;
SetCommonVFPActions(VT);		SetCommonVFPActions(VT);
		}
		}

		if (Subtarget.hasVInstructionsF32()) {
for (MVT VT : F32VecVTs) {		for (MVT VT : F32VecVTs) {
if (Subtarget.hasVInstructionsF32())		if (!isTypeLegal(VT))
		continue;
SetCommonVFPActions(VT);		SetCommonVFPActions(VT);
SetCommonVFPExtLoadTruncStoreActions(VT, F16VecVTs);		SetCommonVFPExtLoadTruncStoreActions(VT, F16VecVTs);
}		}
		}

		if (Subtarget.hasVInstructionsF64()) {
for (MVT VT : F64VecVTs) {		for (MVT VT : F64VecVTs) {
if (Subtarget.hasVInstructionsF64())		if (!isTypeLegal(VT))
		continue;
SetCommonVFPActions(VT);		SetCommonVFPActions(VT);
SetCommonVFPExtLoadTruncStoreActions(VT, F16VecVTs);		SetCommonVFPExtLoadTruncStoreActions(VT, F16VecVTs);
SetCommonVFPExtLoadTruncStoreActions(VT, F32VecVTs);		SetCommonVFPExtLoadTruncStoreActions(VT, F32VecVTs);
}		}
		}

if (Subtarget.useRVVForFixedLengthVectors()) {		if (Subtarget.useRVVForFixedLengthVectors()) {
for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) {		for (MVT VT : MVT::integer_fixedlen_vector_valuetypes()) {
if (!useRVVForFixedLengthVectorVT(VT))		if (!useRVVForFixedLengthVectorVT(VT))
continue;		continue;

// By default everything must be expanded.		// By default everything must be expanded.
for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)		for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op)
▲ Show 20 Lines • Show All 11,391 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/zve32-types.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: sed 's/iXLen/i32/g' %s \| llc -mtriple=riscv32 -mattr=+zve32f,+f,+zvl64b \
				; RUN: -verify-machineinstrs \| FileCheck %s --check-prefixes=CHECK,RV32
				; RUN: sed 's/iXLen/i64/g' %s \| llc -mtriple=riscv64 -mattr=+zve32f,+f,+zvl64b \
				; RUN: -verify-machineinstrs \| FileCheck %s --check-prefixes=CHECK,RV64

				; Sanity check that type legalization kicks in for vscale x 1 types with Zve32.

				; NOTE: The load and store are widened by using VP_LOAD/STORE. The add/fadd are
				; widened by using the next larger LMUL and operating on the whole vector. This
				; isn't optimal, but doesn't crash.

				define void @vadd_vv_nxv1i8(<vscale x 1 x i8>* %pa, <vscale x 1 x i8>* %pb) {
				; CHECK-LABEL: vadd_vv_nxv1i8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a2, vlenb
				; CHECK-NEXT: srli a2, a2, 3
				; CHECK-NEXT: vsetvli zero, a2, e8, mf4, ta, mu
				; CHECK-NEXT: vle8.v v8, (a0)
				; CHECK-NEXT: vle8.v v9, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e8, mf4, ta, mu
				; CHECK-NEXT: vadd.vv v8, v8, v9
				; CHECK-NEXT: vsetvli zero, a2, e8, mf4, ta, mu
				; CHECK-NEXT: vse8.v v8, (a0)
				; CHECK-NEXT: ret
				%va = load <vscale x 1 x i8>, <vscale x 1 x i8>* %pa
				%vb = load <vscale x 1 x i8>, <vscale x 1 x i8>* %pb
				%vc = add <vscale x 1 x i8> %va, %vb
				store <vscale x 1 x i8> %vc, <vscale x 1 x i8>* %pa
				ret void
				}

				define void @vadd_vv_nxv1i16(<vscale x 1 x i16>* %pa, <vscale x 1 x i16>* %pb) {
				; CHECK-LABEL: vadd_vv_nxv1i16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a2, vlenb
				; CHECK-NEXT: srli a2, a2, 3
				; CHECK-NEXT: vsetvli zero, a2, e16, mf2, ta, mu
				; CHECK-NEXT: vle16.v v8, (a0)
				; CHECK-NEXT: vle16.v v9, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e16, mf2, ta, mu
				; CHECK-NEXT: vadd.vv v8, v8, v9
				; CHECK-NEXT: vsetvli zero, a2, e16, mf2, ta, mu
				; CHECK-NEXT: vse16.v v8, (a0)
				; CHECK-NEXT: ret
				%va = load <vscale x 1 x i16>, <vscale x 1 x i16>* %pa
				%vb = load <vscale x 1 x i16>, <vscale x 1 x i16>* %pb
				%vc = add <vscale x 1 x i16> %va, %vb
				store <vscale x 1 x i16> %vc, <vscale x 1 x i16>* %pa
				ret void
				}

				define void @vadd_vv_nxv1i32(<vscale x 1 x i32>* %pa, <vscale x 1 x i32>* %pb) {
				; CHECK-LABEL: vadd_vv_nxv1i32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a2, vlenb
				; CHECK-NEXT: srli a2, a2, 3
				; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu
				; CHECK-NEXT: vle32.v v8, (a0)
				; CHECK-NEXT: vle32.v v9, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32, m1, ta, mu
				; CHECK-NEXT: vadd.vv v8, v8, v9
				; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu
				; CHECK-NEXT: vse32.v v8, (a0)
				; CHECK-NEXT: ret
				%va = load <vscale x 1 x i32>, <vscale x 1 x i32>* %pa
				%vb = load <vscale x 1 x i32>, <vscale x 1 x i32>* %pb
				%vc = add <vscale x 1 x i32> %va, %vb
				store <vscale x 1 x i32> %vc, <vscale x 1 x i32>* %pa
				ret void
				}

				define void @vfadd_vv_nxv1f32(<vscale x 1 x float>* %pa, <vscale x 1 x float>* %pb) {
				; CHECK-LABEL: vfadd_vv_nxv1f32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: csrr a2, vlenb
				; CHECK-NEXT: srli a2, a2, 3
				; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu
				; CHECK-NEXT: vle32.v v8, (a0)
				; CHECK-NEXT: vle32.v v9, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32, m1, ta, mu
				; CHECK-NEXT: vfadd.vv v8, v8, v9
				; CHECK-NEXT: vsetvli zero, a2, e32, m1, ta, mu
				; CHECK-NEXT: vse32.v v8, (a0)
				; CHECK-NEXT: ret
				%va = load <vscale x 1 x float>, <vscale x 1 x float>* %pa
				%vb = load <vscale x 1 x float>, <vscale x 1 x float>* %pb
				%vc = fadd <vscale x 1 x float> %va, %vb
				store <vscale x 1 x float> %vc, <vscale x 1 x float>* %pa
				ret void
				}
				;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				; RV32: {{.*}}
				; RV64: {{.*}}