This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Initial support for RVV intrinsic
AbandonedPublic

Authored by NickHung on Dec 10 2020, 12:14 AM.

Download Raw Diff

Details

Reviewers

evandro
craig.topper
asb

Summary

This patch is meant to discuss the prototype of RVV intrinsic and implement the code generation for intrinsic based on the initial infrastructure D89449.

What this patch has done:

Propose RVV intrinsic prototype:
- Separate optional mask and vl arguments.
- Naming and operand order is aligned to rvv-intrinsic-doc. https://github.com/riscv/rvv-intrinsic-doc

For example, a VADD intrinsic has four prototypes.
VADD(op1, op2)
VADD.M(mask, maskedoff, op1, op2)
VADD.VL(op1, op2, vl)
VADD.M.VL(mask, maskedoff, op1, op2, vl)

Any idea about the prototype of RVV intrinsic?

Code generation for VLE/VSE/VADD intrinsics without mask and vl. The implementation is based on the initial infrastructure D89449.

Do we really need those complex patterns written in the target description file?
In this way, we may need five patterns to select four VVV-form intrinsics and one IR node, such as
VADD, VSUB. and five more patterns for VWADD and VWSUB. Eventually, we may suffer maintenance hell.

Our solution is to select RVV intrinsic without any pattern matching rules
Build two searchable tables to provide information.

RVVLMULIndex table: guide the selection function how to determine the LMUL and SEW of an intrinsic.
RVVIntrinsicToPseudo table: look up a pseudo RVV instruction by intrinsic and the LMUL inferred from above.

Example:
RVVLMULIndex table:
(VADDVV, index 1): LMUL can be inferred by the first operand
(VWADDVV, index 1): LMUL can be inferred by the first operand
(VWADDWV, index 1, dividedBy2): LMUL can be inferred by the first operand then divide LMUL by 2.

LMULIndex = lookupLMULIndexByIntrinsic(VADDVV);
LMUL = inferLMUL(VADDVV, LMULIndex);

The LMUL can be inferred by the operand.
check out: https://github.com/riscv/rvv-intrinsic-doc/blob/master/rvv-intrinsic-rfc.md#data-types

RVVIntrinsicToPseudo table:
(VADDVV, LMUL M1, VADDVV_M1)
(VADDVV, LMUL M2, VADDVV_M2)
(VADDVV, LMUL M4, VADDVV_M4)
(VADDVV, LMUL M8, VADDVV_M8)

PseudoOp = lookupPseudoByIntrinsicAndLMUL(VADDVV, LMUL);

Above can be done within a C++ function.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	70 ms	x64 windows > LLVM.CodeGen/XCore::threads.ll

Event Timeline

NickHung created this revision.Dec 10 2020, 12:14 AM

Herald added subscribers: frasercrmck, luismarques, apazos and 24 others. · View Herald TranscriptDec 10 2020, 12:14 AM

NickHung requested review of this revision.Dec 10 2020, 12:14 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2020, 12:14 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

wuiw added a subscriber: wuiw.Dec 10 2020, 12:16 AM

Harbormaster completed remote builds in B81776: Diff 310776.Dec 10 2020, 12:56 AM

Are you proposing to do custom isel in RISCVISelDAGToDAG.cpp using lookupPseudoByIntrinsicAndLMUL and not using the RISCVGenDAGISel.inc table? Do you have an implementation of that yet?

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	This is setting the VL to VLmax which isn't what the spec wants. It should get the value from the previous vsetvl intrinsic or maybe the previous intrinsic that had a vl argument. Our internal implementation has been implementing the intrinsics without vl by inserting a readvl intrinsic and a call to the intrisics that take vl. But we've been finding issues with this. The readvl is acting as an optimization barrier. It also doesn't have any ordering in IR with respect to intrinsics that have a vl argument unless we mark all intrinsics has having side effects. What are your thoughts on this?

NickHung added inline comments.Dec 10 2020, 5:40 PM

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	In our internal implementation, intrinsic without vl should insert "setvli zero, zero, e32,m1,tu,mu" to kepp the current VL. I didn't touch any code in D89449 in order to respect the authors. Please check out load-add-store-32.ll, it has the same vsetvl which changes the current vl. Our thought is to follow the RVV intrinsic documentation https://github.com/riscv/rvv-intrinsic-doc. So, we provide all kinds of intrinsics with and without vl. Thanks for sharing your ideas with us.

craig.topper added inline comments.Dec 10 2020, 5:49 PM

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	load-add-store-32.ll is intentionally using vlmax. I believe it was intended to model a scalable vector style similar to ARM SVE where there is no VL and instead a whole register of a size only known at runtime is used.

In D93006#2447513, @craig.topper wrote:

Are you proposing to do custom isel in RISCVISelDAGToDAG.cpp using lookupPseudoByIntrinsicAndLMUL and not using the RISCVGenDAGISel.inc table? Do you have an implementation of that yet?

Thanks for your feedback.

Yes, our internal implementation didn't select RVV intrinsic by the generated matching table in RISCVGenDAGISel.inc, because we don't write any matching rules in RISCVInstrInfoVPseudos.td.

We've referenced the code from EPI, check out https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/llvm/lib/Target/RISCV/RISCVInstrInfoEPI.td
The number of lines in RISCVInstrInfoEPI.td is 4323, and most of the classes are duplicated in order to cope with the different formats of RVV.
It also contains many comments such as "//// FIXME: These patterns are wrong", so we guess the author must be struggling when writing the matching rules.

We propose to do custom selection in RISCVISelDAGToDAG.cpp with a programmable function and we've implemented all RVV 1.0 instructions in this way.

NickHung added inline comments.Dec 10 2020, 6:11 PM

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	Thanks for the explanation. Your thought is correct. As I mentioned in D89449, the initial infrastructure should be shared by RVV intrinsic and IR nodes. So, we know it should do a little change to support both RVV intrinsic and IR nodes for distinct programming scenarios.

liaolucy added a subscriber: liaolucy.Dec 14 2020, 12:40 AM

liaolucy added inline comments.

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	Sorry, I'd like to ask a question: About insert vsetvli: intrinsics without vl: insert vsetvli a1(not x0), x0, i.e. vl = vlmax ; Intrinsics with vl: insert vsetvli x0, a1(not x0), i.e. vl = avl ; When do I need to insert vsetvli x0, x0? Code in D89499, !(VLIndex >= 0) , insert vsetvli x0, x0, but all VLindex > 0. If my understanding is incorrect, please correct me，thanks.

craig.topper added inline comments.Dec 14 2020, 12:49 AM

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll
10	I believe that code is there for vmv.x.s and some other instructions that don't read vl.

In D93006#2447593, @NickHung wrote:

In D93006#2447513, @craig.topper wrote:

Are you proposing to do custom isel in RISCVISelDAGToDAG.cpp using lookupPseudoByIntrinsicAndLMUL and not using the RISCVGenDAGISel.inc table? Do you have an implementation of that yet?

Thanks for your feedback.

Yes, our internal implementation didn't select RVV intrinsic by the generated matching table in RISCVGenDAGISel.inc, because we don't write any matching rules in RISCVInstrInfoVPseudos.td.

We've referenced the code from EPI, check out https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/-/blob/EPI/llvm/lib/Target/RISCV/RISCVInstrInfoEPI.td
The number of lines in RISCVInstrInfoEPI.td is 4323, and most of the classes are duplicated in order to cope with the different formats of RVV.
It also contains many comments such as "//// FIXME: These patterns are wrong", so we guess the author must be struggling when writing the matching rules.

We propose to do custom selection in RISCVISelDAGToDAG.cpp with a programmable function and we've implemented all RVV 1.0 instructions in this way.

I believe we have all 0.9 pseudo instructions in now for the base spec. Zvlsseg and Zvamo are being prepared. After looking at the size of the RISCVGenDAGISel.inc table, I'm very interested to see your RISCVISelDAGToDAG.cpp implementation. Do you have code you are able to share?

NickHung abandoned this revision.Mar 16 2021, 11:57 PM

Herald added a subscriber: vkmr. · View Herald TranscriptMar 16 2021, 11:57 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

IR/

IntrinsicsRISCV.td

81 lines

lib/

Target/

RISCV/

RISCVInstrInfoVPseudos.td

80 lines

test/

CodeGen/

RISCV/

rvv/

intrinsic-load-add-store-32.ll

118 lines

Diff 310776

llvm/include/llvm/IR/IntrinsicsRISCV.td

//===- IntrinsicsRISCV.td - Defines RISCV intrinsics -------- tablegen --===//		//===- IntrinsicsRISCV.td - Defines RISCV intrinsics -------- tablegen --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines all of the RISCV-specific intrinsics.		// This file defines all of the RISCV-specific intrinsics.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		class GenForm<bits<3> Gen> {
		bit VV = Gen{2};
		bit VX = Gen{1};
		bit VI = Gen{0};
		}
		def GenVV_VX_VI : GenForm<0b111>;
		def GenVV_VX: GenForm<0b110>;
		def GenVX_VI: GenForm<0b011>;

// Atomics		// Atomics

// Atomic Intrinsics have multiple versions for different access widths, which		// Atomic Intrinsics have multiple versions for different access widths, which
// all follow one of the following signatures (depending on how many arguments		// all follow one of the following signatures (depending on how many arguments
// they require). We carefully instantiate only specific versions of these for		// they require). We carefully instantiate only specific versions of these for
// specific integer widths, rather than using `llvm_anyint_ty`.		// specific integer widths, rather than using `llvm_anyint_ty`.
//		//
// In fact, as these intrinsics take `llvm_anyptr_ty`, the given names are the		// In fact, as these intrinsics take `llvm_anyptr_ty`, the given names are the
Show All 38 Lines	let TargetPrefix = "riscv" in {
defm int_riscv_masked_atomicrmw_min : MaskedAtomicRMWFiveArgIntrinsics;		defm int_riscv_masked_atomicrmw_min : MaskedAtomicRMWFiveArgIntrinsics;
// Unsigned min and max don't need the extra operand.		// Unsigned min and max don't need the extra operand.
defm int_riscv_masked_atomicrmw_umax : MaskedAtomicRMWFourArgIntrinsics;		defm int_riscv_masked_atomicrmw_umax : MaskedAtomicRMWFourArgIntrinsics;
defm int_riscv_masked_atomicrmw_umin : MaskedAtomicRMWFourArgIntrinsics;		defm int_riscv_masked_atomicrmw_umin : MaskedAtomicRMWFourArgIntrinsics;

// @llvm.riscv.masked.cmpxchg.{i32,i64}.<p>(...)		// @llvm.riscv.masked.cmpxchg.{i32,i64}.<p>(...)
defm int_riscv_masked_cmpxchg : MaskedAtomicRMWFiveArgIntrinsics;		defm int_riscv_masked_cmpxchg : MaskedAtomicRMWFiveArgIntrinsics;


		// optional vl
		multiclass RVV_VL<list<LLVMType> output, list<LLVMType> input,
		list<IntrinsicProperty> attr> {

		defvar vl_input = !listconcat(input, [llvm_anyint_ty]);

		def "" : Intrinsic<output, input, attr>;
		def _VL : Intrinsic<output, vl_input, attr>;
		}

		// optional mask and maskedoff
		multiclass RVV_M<list<LLVMType> output, list<LLVMType> input,
		list<IntrinsicProperty> attr, bit WithMO = true> {
		defvar MaskInput = !listconcat(!if(WithMO, [LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>,
		LLVMMatchType<0>], [LLVMScalarOrSameVectorWidth<0, llvm_i1_ty>]), input);

		defm "" : RVV_VL<output, input, attr>;
		defm _M : RVV_VL<output, MaskInput, attr>;
		}

		// Load unit
		multiclass RVV_LOAD_UNIT {
		defvar output = [llvm_anyvector_ty];
		defvar input = [llvm_anyptr_ty];
		defvar attr = [IntrReadMem, IntrArgMemOnly];

		defm "" : RVV_M<output, input, attr>;
		}

		// store unit
		multiclass RVV_STORE_UNIT {
		defvar output = []<LLVMType>;
		defvar input = [llvm_anyvector_ty, llvm_anyptr_ty];
		defvar attr = [IntrWriteMem, IntrArgMemOnly];

		defm "" : RVV_M<output, input, attr, false>;
		}
		//
		// Load/Store
		// support EEW up to ELEN = 64
		foreach EEW = [8, 16, 32, 64] in {
		defm int_riscv_VLE#EEW#_V : RVV_LOAD_UNIT;
		defm int_riscv_VSE#EEW#_V : RVV_STORE_UNIT;
		}


		// with optional mask and maskedoff
		multiclass RVV_ALU_OM<list<LLVMType> output, list<list<LLVMType>> inputs, GenForm Gen> {
		defvar attr = [IntrNoMem];
		defvar attr_vi = [IntrNoMem, ImmArg<ArgIndex<1>>];

		if Gen.VV then
		defm _VV : RVV_M<output, inputs[0], attr>;
		if Gen.VX then
		defm _VX : RVV_M<output, inputs[1], attr>;
		if Gen.VI then
		defm _VI : RVV_M<output, inputs[2], attr_vi>;
		}


		defvar RVV_ALU_VV_V_X_I_OM_List = ["VADD"];
		foreach I = RVV_ALU_VV_V_X_I_OM_List in {
		defvar output = [llvm_anyvector_ty];
		defvar input_vv = [LLVMMatchType<0>, LLVMMatchType<0>];
		defvar input_vx = [LLVMMatchType<0>, llvm_anyint_ty];
		defvar input_vi = [LLVMMatchType<0>, llvm_anyint_ty];
		defvar inputs = [input_vv, input_vx, input_vi];

		defm int_riscv_#I : RVV_ALU_OM<output, inputs, GenVV_VX_VI>;
		}

} // TargetPrefix = "riscv"		} // TargetPrefix = "riscv"

llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td

Show First 20 Lines • Show All 176 Lines • ▼ Show 20 Lines	def : Pat<(result_type (vop
(op_type op_reg_class:$rs2))),		(op_type op_reg_class:$rs2))),
(instruction (result_type (IMPLICIT_DEF)),		(instruction (result_type (IMPLICIT_DEF)),
op_reg_class:$rs1,		op_reg_class:$rs1,
op_reg_class:$rs2,		op_reg_class:$rs2,
(mask_type zero_reg),		(mask_type zero_reg),
VLMax, sew)>;		VLMax, sew)>;
}		}

		multiclass pat_intrinsic_binary<Intrinsic fromOp, Instruction toOp,
		ValueType result_type, ValueType op1_type,
		ValueType op2_type, ValueType mask_type,
		int sew, LMULInfo vlmul, VReg result_reg_class,
		VReg op_reg_class, DAGOperand op2_kind> {

		def : Pat<(result_type (fromOp (op1_type op_reg_class:$op1),
		(op2_type op2_kind:$op2))),
		(toOp (result_type (IMPLICIT_DEF)), op_reg_class:$op1,
		op2_kind:$op2,
		(mask_type zero_reg), VLMax, sew)>;
		}



		multiclass pat_intrinsic_binary_common<string op_name, GenForm Gen = GenVV_VX_VI>
		{
		foreach vti = AllIntegerVectors in {
		defvar LMulSuffix = vti.LMul.MX;
		defvar op_vv_name = op_name#_VV;
		defvar op_vx_name = op_name#_VX;
		defvar op_vi_name = op_name#_VI;

		if Gen.VV then {
		defvar intrinsic_vv = !cast<Intrinsic>("int_riscv_"#op_vv_name);
		defvar pseudo_vv = !cast<Instruction>("Pseudo"#op_vv_name#_#LMulSuffix);
		defm : pat_intrinsic_binary<intrinsic_vv, pseudo_vv, vti.Vector,
		vti.Vector, vti.Vector, vti.Mask, vti.SEW,
		vti.LMul, vti.RegClass, vti.RegClass,
		vti.RegClass>;
		}
		if Gen.VX then {
		defvar intrinsic_vx = !cast<Intrinsic>("int_riscv_"#op_vx_name);
		defvar pseudo_vx = !cast<Instruction>("Pseudo"#op_vx_name#_#LMulSuffix);
		defm : pat_intrinsic_binary<intrinsic_vx, pseudo_vx, vti.Vector,
		vti.Vector, XLenVT, vti.Mask, vti.SEW,
		vti.LMul, vti.RegClass, vti.RegClass, GPR>;
		}
		if Gen.VI then {
		defvar intrinsic_vi = !cast<Intrinsic>("int_riscv_"#op_vi_name);
		defvar pseudo_vi = !cast<Instruction>("Pseudo"#op_vi_name#_#LMulSuffix);
		defm : pat_intrinsic_binary<intrinsic_vi, pseudo_vi, vti.Vector,
		vti.Vector, XLenVT, vti.Mask, vti.SEW,
		vti.LMul, vti.RegClass, vti.RegClass, simm5>;
		}
		}
		}

multiclass pat_vop_binary_common<SDNode vop,		multiclass pat_vop_binary_common<SDNode vop,
string instruction_name,		string instruction_name,
list<VTypeInfo> vtilist>		list<VTypeInfo> vtilist>
{		{
foreach vti = vtilist in		foreach vti = vtilist in
defm : pat_vop_binary<vop, instruction_name,		defm : pat_vop_binary<vop, instruction_name,
vti.Vector, vti.Vector, vti.Mask, vti.SEW,		vti.Vector, vti.Vector, vti.Mask, vti.SEW,
vti.LMul, vti.RegClass, vti.RegClass>;		vti.LMul, vti.RegClass, vti.RegClass>;
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	let mayLoad = 0, mayStore = 1, hasSideEffects = 0,
(ins vreg:$rd, GPR:$rs1, VMaskOp:$mask, GPR:$vl,		(ins vreg:$rd, GPR:$rs1, VMaskOp:$mask, GPR:$vl,
ixlenimm:$sew),		ixlenimm:$sew),
[]>,		[]>,
RISCVVPseudo;		RISCVVPseudo;
}		}
}		}
}		}

		multiclass pat_intrinsic_load<string op_name>
		{
		foreach vti = AllVectors in {
		defvar intrinsic = !cast<Intrinsic>("int_riscv_"#op_name # vti.SEW # "_V");
		defvar instruction = !cast<Instruction>("Pseudo"#op_name # vti.SEW # "_V_"#
		vti.LMul.MX);
		def : Pat<(vti.Vector (intrinsic GPR:$rs1)),
		(instruction (vti.Vector (IMPLICIT_DEF)), GPR:$rs1,
		(vti.Mask zero_reg), VLMax, vti.SEW)>;
		}
		}

		multiclass pat_intrinsic_store<string op_name>
		{
		foreach vti = AllVectors in {
		defvar intrinsic = !cast<Intrinsic>("int_riscv_"#op_name # vti.SEW # "_V");
		defvar instruction = !cast<Instruction>("Pseudo"#op_name # vti.SEW # "_V_"#
		vti.LMul.MX);
		def : Pat<(intrinsic vti.Vector:$rs2, GPR:$rs1),
		(instruction vti.RegClass:$rs2, GPR:$rs1, (vti.Mask zero_reg),
		VLMax, vti.SEW)>;
		}

		}
// Patterns.		// Patterns.
multiclass pat_load_store<LLVMType type,		multiclass pat_load_store<LLVMType type,
LLVMType mask_type,		LLVMType mask_type,
int sew,		int sew,
LMULInfo vlmul,		LMULInfo vlmul,
VReg reg_class>		VReg reg_class>
{		{
defvar load_instr = !cast<Instruction>("PseudoVLE" # sew # "_V_"# vlmul.MX);		defvar load_instr = !cast<Instruction>("PseudoVLE" # sew # "_V_"# vlmul.MX);
Show All 22 Lines
}		}

foreach vti = AllVectors in		foreach vti = AllVectors in
{		{
defm : pat_load_store<vti.Vector, vti.Mask,		defm : pat_load_store<vti.Vector, vti.Mask,
vti.SEW, vti.LMul, vti.RegClass>;		vti.SEW, vti.LMul, vti.RegClass>;
}		}

		defm "" : pat_intrinsic_load<"VLE">;
		defm "" :pat_intrinsic_store<"VSE">;
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// 12. Vector Integer Arithmetic Instructions		// 12. Vector Integer Arithmetic Instructions
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// 12.1. Vector Single-Width Integer Add and Subtract		// 12.1. Vector Single-Width Integer Add and Subtract
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Pseudo instructions.		// Pseudo instructions.
defm PseudoVADD : pseudo_binary_v_vv_vx_vi;		defm PseudoVADD : pseudo_binary_v_vv_vx_vi;

// Whole-register vector patterns.		// Whole-register vector patterns.
defm "" : pat_vop_binary_common<add, "PseudoVADD", AllIntegerVectors>;		defm "" : pat_vop_binary_common<add, "PseudoVADD", AllIntegerVectors>;


		// intrinsic pattern
		defvar VALU_V_X_I_List = ["VADD"];
		foreach I = VALU_V_X_I_List in
		defm "" : pat_intrinsic_binary_common<I, GenVV_VX_VI>;

} // Predicates = [HasStdExtV]		} // Predicates = [HasStdExtV]

llvm/test/CodeGen/RISCV/rvv/intrinsic-load-add-store-32.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple riscv32 -mattr=+experimental-v %s -o - \
				; RUN: -verify-machineinstrs \| FileCheck %s
				; RUN: llc -mtriple riscv64 -mattr=+experimental-v %s -o - \
				; RUN: -verify-machineinstrs \| FileCheck %s

				define void @vadd_vint32m1(<vscale x 2 x i32> %pc, <vscale x 2 x i32> %pa, <vscale x 2 x i32> *%pb) nounwind {
				; CHECK-LABEL: vadd_vint32m1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a3, zero, e32,m1,tu,mu
				craig.topperUnsubmitted Not Done Reply Inline Actions This is setting the VL to VLmax which isn't what the spec wants. It should get the value from the previous vsetvl intrinsic or maybe the previous intrinsic that had a vl argument. Our internal implementation has been implementing the intrinsics without vl by inserting a readvl intrinsic and a call to the intrisics that take vl. But we've been finding issues with this. The readvl is acting as an optimization barrier. It also doesn't have any ordering in IR with respect to intrinsics that have a vl argument unless we mark all intrinsics has having side effects. What are your thoughts on this? craig.topper: This is setting the VL to VLmax which isn't what the spec wants. It should get the value from…
				NickHungAuthorUnsubmitted Done Reply Inline Actions In our internal implementation, intrinsic without vl should insert "setvli zero, zero, e32,m1,tu,mu" to kepp the current VL. I didn't touch any code in D89449 in order to respect the authors. Please check out load-add-store-32.ll, it has the same vsetvl which changes the current vl. Our thought is to follow the RVV intrinsic documentation https://github.com/riscv/rvv-intrinsic-doc. So, we provide all kinds of intrinsics with and without vl. Thanks for sharing your ideas with us. NickHung: In our internal implementation, intrinsic without vl should insert "setvli zero, zero, e32,m1…
				craig.topperUnsubmitted Not Done Reply Inline Actions load-add-store-32.ll is intentionally using vlmax. I believe it was intended to model a scalable vector style similar to ARM SVE where there is no VL and instead a whole register of a size only known at runtime is used. craig.topper: load-add-store-32.ll is intentionally using vlmax. I believe it was intended to model a…
				NickHungAuthorUnsubmitted Done Reply Inline Actions Thanks for the explanation. Your thought is correct. As I mentioned in D89449, the initial infrastructure should be shared by RVV intrinsic and IR nodes. So, we know it should do a little change to support both RVV intrinsic and IR nodes for distinct programming scenarios. NickHung: Thanks for the explanation. Your thought is correct. As I mentioned in D89449, the initial…
				liaolucyUnsubmitted Not Done Reply Inline Actions Sorry, I'd like to ask a question: About insert vsetvli: intrinsics without vl: insert vsetvli a1(not x0), x0, i.e. vl = vlmax ; Intrinsics with vl: insert vsetvli x0, a1(not x0), i.e. vl = avl ; When do I need to insert vsetvli x0, x0? Code in D89499, !(VLIndex >= 0) , insert vsetvli x0, x0, but all VLindex > 0. If my understanding is incorrect, please correct me，thanks. liaolucy: Sorry, I'd like to ask a question: About insert vsetvli: intrinsics without vl: insert…
				craig.topperUnsubmitted Not Done Reply Inline Actions I believe that code is there for vmv.x.s and some other instructions that don't read vl. craig.topper: I believe that code is there for vmv.x.s and some other instructions that don't read vl.
				; CHECK-NEXT: vle32.v v25, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32,m1,tu,mu
				; CHECK-NEXT: vle32.v v26, (a2)
				; CHECK-NEXT: vsetvli a1, zero, e32,m1,tu,mu
				; CHECK-NEXT: vadd.vv v25, v25, v26
				; CHECK-NEXT: vsetvli a1, zero, e32,m1,tu,mu
				; CHECK-NEXT: vse32.v v25, (a0)
				; CHECK-NEXT: ret
				%va = tail call <vscale x 2 x i32> @llvm.riscv.VLE32.V.nxv2i32.p0nxv2i32(<vscale x 2 x i32> * %pa)
				%vb = tail call <vscale x 2 x i32> @llvm.riscv.VLE32.V.nxv2i32.p0nxv2i32(<vscale x 2 x i32> * %pb)
				%vc = tail call <vscale x 2 x i32> @llvm.riscv.VADD.VV.nxv2i32(<vscale x 2 x i32> %va, <vscale x 2 x i32> %vb)
				tail call void @llvm.riscv.VSE32.V.nxv2i32.p0nxv2i32(<vscale x 2 x i32> %vc, <vscale x 2 x i32>* %pc)
				ret void
				}

				define void @vadd_vint32m2(<vscale x 4 x i32> %pc, <vscale x 4 x i32> %pa, <vscale x 4 x i32> *%pb) nounwind {
				; CHECK-LABEL: vadd_vint32m2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a3, zero, e32,m2,tu,mu
				; CHECK-NEXT: vle32.v v26, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32,m2,tu,mu
				; CHECK-NEXT: vle32.v v28, (a2)
				; CHECK-NEXT: vsetvli a1, zero, e32,m2,tu,mu
				; CHECK-NEXT: vadd.vv v26, v26, v28
				; CHECK-NEXT: vsetvli a1, zero, e32,m2,tu,mu
				; CHECK-NEXT: vse32.v v26, (a0)
				; CHECK-NEXT: ret
				%va = tail call <vscale x 4 x i32> @llvm.riscv.VLE32.V.nxv4i32.p0i32(<vscale x 4 x i32> * %pa)
				%vb = tail call <vscale x 4 x i32> @llvm.riscv.VLE32.V.nxv4i32.p0i32(<vscale x 4 x i32> * %pb)
				%vc = tail call <vscale x 4 x i32> @llvm.riscv.VADD.VV.nxv4i32(<vscale x 4 x i32> %va, <vscale x 4 x i32> %vb)
				tail call void @llvm.riscv.VSE32.V.nxv4i32.p0nxv4i32(<vscale x 4 x i32> %vc, <vscale x 4 x i32>* %pc)
				ret void
				}

				define void @vadd_vint32m4(<vscale x 8 x i32> %pc, <vscale x 8 x i32> %pa, <vscale x 8 x i32> *%pb) nounwind {
				; CHECK-LABEL: vadd_vint32m4:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a3, zero, e32,m4,tu,mu
				; CHECK-NEXT: vle32.v v28, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32,m4,tu,mu
				; CHECK-NEXT: vle32.v v8, (a2)
				; CHECK-NEXT: vsetvli a1, zero, e32,m4,tu,mu
				; CHECK-NEXT: vadd.vv v28, v28, v8
				; CHECK-NEXT: vsetvli a1, zero, e32,m4,tu,mu
				; CHECK-NEXT: vse32.v v28, (a0)
				; CHECK-NEXT: ret
				%va = tail call <vscale x 8 x i32> @llvm.riscv.VLE32.V.nxv8i32.p0i32(<vscale x 8 x i32> * %pa)
				%vb = tail call <vscale x 8 x i32> @llvm.riscv.VLE32.V.nxv8i32.p0i32(<vscale x 8 x i32> * %pb)
				%vc = tail call <vscale x 8 x i32> @llvm.riscv.VADD.VV.nxv8i32(<vscale x 8 x i32> %va, <vscale x 8 x i32> %vb)
				tail call void @llvm.riscv.VSE32.V.nxv8i32.p0nxv8i32(<vscale x 8 x i32> %vc, <vscale x 8 x i32>* %pc)
				ret void
				}

				define void @vadd_vint32m8(<vscale x 16 x i32> %pc, <vscale x 16 x i32> %pa, <vscale x 16 x i32> *%pb) nounwind {
				; CHECK-LABEL: vadd_vint32m8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a3, zero, e32,m8,tu,mu
				; CHECK-NEXT: vle32.v v8, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32,m8,tu,mu
				; CHECK-NEXT: vle32.v v16, (a2)
				; CHECK-NEXT: vsetvli a1, zero, e32,m8,tu,mu
				; CHECK-NEXT: vadd.vv v8, v8, v16
				; CHECK-NEXT: vsetvli a1, zero, e32,m8,tu,mu
				; CHECK-NEXT: vse32.v v8, (a0)
				; CHECK-NEXT: ret
				%va = tail call <vscale x 16 x i32> @llvm.riscv.VLE32.V.nxv16i32.p0i32(<vscale x 16 x i32> * %pa)
				%vb = tail call <vscale x 16 x i32> @llvm.riscv.VLE32.V.nxv16i32.p0i32(<vscale x 16 x i32> * %pb)
				%vc = tail call <vscale x 16 x i32> @llvm.riscv.VADD.VV.nxv16i32(<vscale x 16 x i32> %va, <vscale x 16 x i32> %vb)
				tail call void @llvm.riscv.VSE32.V.nxv16i32.p0nxv16i32(<vscale x 16 x i32> %vc, <vscale x 16 x i32>* %pc)
				ret void
				}

				define void @vadd_vint32mf2(<vscale x 1 x i32> %pc, <vscale x 1 x i32> %pa, <vscale x 1 x i32> *%pb) nounwind {
				; CHECK-LABEL: vadd_vint32mf2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a3, zero, e32,mf2,tu,mu
				; CHECK-NEXT: vle32.v v25, (a1)
				; CHECK-NEXT: vsetvli a1, zero, e32,mf2,tu,mu
				; CHECK-NEXT: vle32.v v26, (a2)
				; CHECK-NEXT: vsetvli a1, zero, e32,mf2,tu,mu
				; CHECK-NEXT: vadd.vv v25, v25, v26
				; CHECK-NEXT: vsetvli a1, zero, e32,mf2,tu,mu
				; CHECK-NEXT: vse32.v v25, (a0)
				; CHECK-NEXT: ret
				%va = tail call <vscale x 1 x i32> @llvm.riscv.VLE32.V.nxv1i32.p0i32(<vscale x 1 x i32> * %pa)
				%vb = tail call <vscale x 1 x i32> @llvm.riscv.VLE32.V.nxv1i32.p0i32(<vscale x 1 x i32> * %pb)
				%vc = tail call <vscale x 1 x i32> @llvm.riscv.VADD.VV.nxv1i32(<vscale x 1 x i32> %va, <vscale x 1 x i32> %vb)
				tail call void @llvm.riscv.VSE32.V.nxv1i32.p0nxv1i32(<vscale x 1 x i32> %vc, <vscale x 1 x i32>* %pc)
				ret void
				}
				declare <vscale x 2 x i32> @llvm.riscv.VLE32.V.nxv2i32.p0nxv2i32(<vscale x 2 x i32>*)
				declare <vscale x 4 x i32> @llvm.riscv.VLE32.V.nxv4i32.p0i32(<vscale x 4 x i32>*)
				declare <vscale x 8 x i32> @llvm.riscv.VLE32.V.nxv8i32.p0i32(<vscale x 8 x i32>*)
				declare <vscale x 16 x i32> @llvm.riscv.VLE32.V.nxv16i32.p0i32(<vscale x 16 x i32>*)
				declare <vscale x 1 x i32> @llvm.riscv.VLE32.V.nxv1i32.p0i32(<vscale x 1 x i32>*)

				declare <vscale x 2 x i32> @llvm.riscv.VADD.VV.nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>)
				declare <vscale x 4 x i32> @llvm.riscv.VADD.VV.nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>)
				declare <vscale x 8 x i32> @llvm.riscv.VADD.VV.nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>)
				declare <vscale x 16 x i32> @llvm.riscv.VADD.VV.nxv16i32(<vscale x 16 x i32>, <vscale x 16 x i32>)
				declare <vscale x 1 x i32> @llvm.riscv.VADD.VV.nxv1i32(<vscale x 1 x i32>, <vscale x 1 x i32>)

				declare void @llvm.riscv.VSE32.V.nxv2i32.p0nxv2i32(<vscale x 2 x i32>, <vscale x 2 x i32>*)
				declare void @llvm.riscv.VSE32.V.nxv4i32.p0nxv4i32(<vscale x 4 x i32>, <vscale x 4 x i32>*)
				declare void @llvm.riscv.VSE32.V.nxv8i32.p0nxv8i32(<vscale x 8 x i32>, <vscale x 8 x i32>*)
				declare void @llvm.riscv.VSE32.V.nxv16i32.p0nxv16i32(<vscale x 16 x i32>, <vscale x 16 x i32>*)
				declare void @llvm.riscv.VSE32.V.nxv1i32.p0nxv1i32(<vscale x 1 x i32>, <vscale x 1 x i32>*)