llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	How do we ensure that the location we're loading/storing is the right size for this? The mask is (vlenb/8) * 64 * 1 bits. But the load/store size is (vlenb/8)648 bits.

rogfer01 added inline comments.Dec 16 2020, 2:15 PM

llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	I imagine we could set `vl=max(1,vlenb/(8*8)),sew=8` in this case rather than `vl=vlmax,sew=8`. We still have to load/store at least one `i8` (hence the `max`). This is what already happens when a scalar load/store of `i1` appears in IR. However I'm not sure whether this scenario in IR will happen very often. If it doesn't then I imagine the slightly less straightforward code generation may be OK?

rogfer01 added inline comments.Dec 16 2020, 2:17 PM

llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	I forgot to account `lmul`, so I think a reasonable `vl` would be `vl=max(1, (vlenblmul)/(88)))`

HsiangKai added inline comments.Dec 16 2020, 10:25 PM

llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	I configure the load/store using e8,m1. The load/store size is (vlenb/8)88 bits, not (vlenb/8)648. (vlenb/8)648 is e8,m8. You could image that I treat the load/store as load/store <vscale x 8 x i8> type. Why do we need to consider LMUL here? All mask types are stored in one vector registers. All the load/store for mask types should use PseudoVLE#sew#_V_M1/PseudoVSE#sew#_V_M1. We will reserve a whole vector register size in stack for mask types. Use e8,m1 and vl=VLMAX should be able to correctly read out the mask values.

craig.topper added inline comments.Dec 16 2020, 10:35 PM

llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	You're right I did do that math wrong. So the vscale x 64 x i1 is ok. But we're using e8,m1 with vlmax for types smaller than vscale x 64 x i1 as well right? When you say "We will reserve a whole vector register size in stack for mask types." You mean for spills and reloads? That's a different case than these IR tests right?

HsiangKai added inline comments.Dec 16 2020, 11:04 PM

llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15	I think so. For vscale x 32 x i1, we still use e8,m1 with vlmax to read the whole vector out. Yeah, what in my mind is spilling and argument passing through stack. We have not implemented frame handling in the upstream. So, I created the test cases in this way. I will prepare the frame handling for RISC-V V later.

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

In D93364#2462242, @HsiangKai wrote:

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

In D93364#2462311, @craig.topper wrote:

In D93364#2462242, @HsiangKai wrote:

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

In D93364#2463053, @HsiangKai wrote:

In D93364#2462311, @craig.topper wrote:

In D93364#2462242, @HsiangKai wrote:

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

My concern is that I'm not convinced that something like this is correct

%a = alloca <vscale x 32 x i1>
%b = store <vscale x 32 x i1> %c, <vscale x 32 x i1>* %a

getTypeSize for that alloca woud return a scalable result with a fixed size of 32. Would we still end up allocating (vlen / 8) * 64 bytes on the stack for that. Or would it be (vlen / 8) * 32 bytes?

In D93364#2463535, @craig.topper wrote:
In D93364#2463053, @HsiangKai wrote:

In D93364#2462311, @craig.topper wrote:

In D93364#2462242, @HsiangKai wrote:

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

My concern is that I'm not convinced that something like this is correct
%a = alloca <vscale x 32 x i1>
%b = store <vscale x 32 x i1> %c, <vscale x 32 x i1>* %a
getTypeSize for that alloca woud return a scalable result with a fixed size of 32. Would we still end up allocating (vlen / 8) * 64 bytes on the stack for that. Or would it be (vlen / 8) * 32 bytes?

We could use the size of one vector register as the unit to allocate stack for RVV objects. Even the size of mask type variable is smaller than one vector register. We could allocate one vector register size for it. It is similar for fractional LMUL variables. There are whole register load/store in v1.0 to access RVV objects in the stack for LMUL = 1, 2, 4, 8.

In our downstream implementation, there is a snippet to calculate how many vectors to reserve in the stack for RVV objects.

int64_t ObjectSize = MFI.getObjectSize(FI);

unsigned ShiftAmount;
// Mask objects may be logically smaller than the spill size of the VR
// class.
if (ObjectSize <= TRI->getSpillSize(RISCV::VRRegClass))
  ShiftAmount = 0;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM2RegClass))
  ShiftAmount = 1;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM4RegClass))
  ShiftAmount = 2;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM8RegClass))
  ShiftAmount = 3;
else
  llvm_unreachable("Unexpected object size");

craig.topper mentioned this in D93368: [RISCV] Define vector compare intrinsics..Dec 19 2020, 2:21 PM

khchen mentioned this in D93823: [RISCV] Define vmsbf.m/vmsif.m/vmsof.m/viota.m/vid.v intrinsics..Dec 26 2020, 9:00 AM

Rebase on master.

Harbormaster completed remote builds in B83562: Diff 313818.Dec 27 2020, 8:39 PM

khchen mentioned this in rGe673d4019947: [RISCV] Define vmsbf.m/vmsif.m/vmsof.m/viota.m/vid.v intrinsics..Dec 28 2020, 6:16 AM

To consider the frame handling in D93750, is it reasonable to load/store whole vector registers for mask types regardless which kind of mask types?

Use vle1.v/vse1.v to load/store mask types.

Herald added subscribers: StephenFan, vkmr. · View Herald TranscriptFeb 1 2021, 3:45 PM

HsiangKai edited the summary of this revision. (Show Details)Feb 1 2021, 3:46 PM

HsiangKai added a parent revision: D95781: [RISCV] Add new vector instructions in v0.10..

Harbormaster completed remote builds in B87442: Diff 320628.Feb 1 2021, 5:53 PM

LGTM. There will be a conflict with D95844 so be careful.

This revision is now accepted and ready to land.Feb 2 2021, 1:47 AM

LGTM

This patch uses vle1.v/vse1.v. It depends on D95781. I will commit it after D95781 is accepted.

This revision was landed with ongoing or failed builds.Feb 2 2021, 9:44 PM

Closed by commit rG63baeec66e7f: [RISCV] Load/store vector mask types. (authored by HsiangKai). · Explain Why

This revision was automatically updated to reflect the committed changes.

HsiangKai added a commit: rG63baeec66e7f: [RISCV] Load/store vector mask types..

Diff 321000

llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td

	Show All 39 Lines
	def SplatPat_uimm5 : ComplexPattern<vAny, 1, "selectVSplatUimm5", []>;			def SplatPat_uimm5 : ComplexPattern<vAny, 1, "selectVSplatUimm5", []>;

	def RVVBaseAddr : ComplexPattern<iPTR, 1, "SelectRVVBaseAddr">;			def RVVBaseAddr : ComplexPattern<iPTR, 1, "SelectRVVBaseAddr">;

	class SwapHelper<dag Prefix, dag A, dag B, dag Suffix, bit swap> {			class SwapHelper<dag Prefix, dag A, dag B, dag Suffix, bit swap> {
	dag Value = !con(Prefix, !if(swap, B, A), !if(swap, A, B), Suffix);			dag Value = !con(Prefix, !if(swap, B, A), !if(swap, A, B), Suffix);
	}			}

	multiclass VPatUSLoadStoreSDNode<ValueType type,			multiclass VPatUSLoadStoreSDNode<LLVMType type,
	ValueType mask_type,
	int sew,			int sew,
	LMULInfo vlmul,			LMULInfo vlmul,
	OutPatFrag avl,			OutPatFrag avl,
	VReg reg_class>			VReg reg_class>
	{			{
	defvar load_instr = !cast<Instruction>("PseudoVLE"#sew#"_V_"#vlmul.MX);			defvar load_instr = !cast<Instruction>("PseudoVLE"#sew#"_V_"#vlmul.MX);
	defvar store_instr = !cast<Instruction>("PseudoVSE"#sew#"_V_"#vlmul.MX);			defvar store_instr = !cast<Instruction>("PseudoVSE"#sew#"_V_"#vlmul.MX);
	// Load			// Load
	def : Pat<(type (load RVVBaseAddr:$rs1)),			def : Pat<(type (load RVVBaseAddr:$rs1)),
	(load_instr RVVBaseAddr:$rs1, avl, sew)>;			(load_instr RVVBaseAddr:$rs1, avl, sew)>;
	// Store			// Store
	def : Pat<(store type:$rs2, RVVBaseAddr:$rs1),			def : Pat<(store type:$rs2, RVVBaseAddr:$rs1),
	(store_instr reg_class:$rs2, RVVBaseAddr:$rs1, avl, sew)>;			(store_instr reg_class:$rs2, RVVBaseAddr:$rs1, avl, sew)>;
	}			}

				multiclass VPatUSLoadStoreMaskSDNode<MTypeInfo m>
				{
				defvar load_instr = !cast<Instruction>("PseudoVLE1_V_"#m.BX);
				defvar store_instr = !cast<Instruction>("PseudoVSE1_V_"#m.BX);
				// Load
				def : Pat<(m.Mask (load RVVBaseAddr:$rs1)),
				(load_instr RVVBaseAddr:$rs1, m.AVL, m.SEW)>;
				// Store
				def : Pat<(store m.Mask:$rs2, RVVBaseAddr:$rs1),
				(store_instr VR:$rs2, RVVBaseAddr:$rs1, m.AVL, m.SEW)>;
				}

	class VPatBinarySDNode_VV<SDNode vop,			class VPatBinarySDNode_VV<SDNode vop,
	string instruction_name,			string instruction_name,
	ValueType result_type,			ValueType result_type,
	ValueType op_type,			ValueType op_type,
	ValueType mask_type,			ValueType mask_type,
	int sew,			int sew,
	LMULInfo vlmul,			LMULInfo vlmul,
	OutPatFrag avl,			OutPatFrag avl,
	▲ Show 20 Lines • Show All 274 Lines • ▼ Show 20 Lines
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Patterns.			// Patterns.
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	let Predicates = [HasStdExtV] in {			let Predicates = [HasStdExtV] in {

	// 7.4. Vector Unit-Stride Instructions			// 7.4. Vector Unit-Stride Instructions
	foreach vti = AllVectors in			foreach vti = AllVectors in
	defm "" : VPatUSLoadStoreSDNode<vti.Vector, vti.Mask, vti.SEW, vti.LMul,			defm "" : VPatUSLoadStoreSDNode<vti.Vector, vti.SEW, vti.LMul,
	vti.AVL, vti.RegClass>;			vti.AVL, vti.RegClass>;
				foreach mti = AllMasks in
				defm "" : VPatUSLoadStoreMaskSDNode<mti>;

	// 12.1. Vector Single-Width Integer Add and Subtract			// 12.1. Vector Single-Width Integer Add and Subtract
	defm "" : VPatBinarySDNode_VV_VX_VI<add, "PseudoVADD">;			defm "" : VPatBinarySDNode_VV_VX_VI<add, "PseudoVADD">;
	defm "" : VPatBinarySDNode_VV_VX<sub, "PseudoVSUB">;			defm "" : VPatBinarySDNode_VV_VX<sub, "PseudoVSUB">;
	// Handle VRSUB specially since it's the only integer binary op with reversed			// Handle VRSUB specially since it's the only integer binary op with reversed
	// pattern operands			// pattern operands
	foreach vti = AllIntegerVectors in {			foreach vti = AllIntegerVectors in {
	def : Pat<(sub (vti.Vector (SplatPat XLenVT:$rs2)),			def : Pat<(sub (vti.Vector (SplatPat XLenVT:$rs2)),
	▲ Show 20 Lines • Show All 444 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/rvv/load-mask.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc -mtriple riscv32 -mattr=+experimental-v %s -o - \
				; RUN: -verify-machineinstrs \| FileCheck %s
				; RUN: llc -mtriple riscv64 -mattr=+experimental-v %s -o - \
				; RUN: -verify-machineinstrs \| FileCheck %s

				define void @test_load_mask_64(<vscale x 64 x i1>* %pa, <vscale x 64 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_64:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,m8,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 64 x i1>, <vscale x 64 x i1>* %pa
				store <vscale x 64 x i1> %a, <vscale x 64 x i1>* %pb
				craig.topperUnsubmitted Not Done Reply Inline Actions How do we ensure that the location we're loading/storing is the right size for this? The mask is (vlenb/8) * 64 * 1 bits. But the load/store size is (vlenb/8)648 bits. craig.topper: How do we ensure that the location we're loading/storing is the right size for this? The mask…
				rogfer01Unsubmitted Not Done Reply Inline Actions I imagine we could set `vl=max(1,vlenb/(88)),sew=8` in this case rather than `vl=vlmax,sew=8`. We still have to load/store at least one `i8` (hence the `max`). This is what already happens when a scalar load/store of `i1` appears in IR. However I'm not sure whether this scenario in IR will happen very often. If it doesn't then I imagine the slightly less straightforward code generation may be OK? rogfer01:* I imagine we could set `vl=max(1,vlenb/(8*8)),sew=8` in this case rather than `vl=vlmax,sew=8`.
				rogfer01Unsubmitted Not Done Reply Inline Actions I forgot to account `lmul`, so I think a reasonable `vl` would be `vl=max(1, (vlenblmul)/(88)))` rogfer01: I forgot to account `lmul`, so I think a reasonable `vl` would be `vl=max(1, (vlenb*lmul)/…
				HsiangKaiAuthorUnsubmitted Done Reply Inline Actions I configure the load/store using e8,m1. The load/store size is (vlenb/8)88 bits, not (vlenb/8)648. (vlenb/8)648 is e8,m8. You could image that I treat the load/store as load/store <vscale x 8 x i8> type. Why do we need to consider LMUL here? All mask types are stored in one vector registers. All the load/store for mask types should use PseudoVLE#sew#_V_M1/PseudoVSE#sew#_V_M1. We will reserve a whole vector register size in stack for mask types. Use e8,m1 and vl=VLMAX should be able to correctly read out the mask values. HsiangKai: I configure the load/store using e8,m1. The load/store size is (vlenb/8)88 bits, not…
				craig.topperUnsubmitted Not Done Reply Inline Actions You're right I did do that math wrong. So the vscale x 64 x i1 is ok. But we're using e8,m1 with vlmax for types smaller than vscale x 64 x i1 as well right? When you say "We will reserve a whole vector register size in stack for mask types." You mean for spills and reloads? That's a different case than these IR tests right? craig.topper: You're right I did do that math wrong. So the vscale x 64 x i1 is ok. But we're using e8,m1…
				HsiangKaiAuthorUnsubmitted Done Reply Inline Actions I think so. For vscale x 32 x i1, we still use e8,m1 with vlmax to read the whole vector out. Yeah, what in my mind is spilling and argument passing through stack. We have not implemented frame handling in the upstream. So, I created the test cases in this way. I will prepare the frame handling for RISC-V V later. HsiangKai: I think so. For vscale x 32 x i1, we still use e8,m1 with vlmax to read the whole vector out.
				ret void
				}

				define void @test_load_mask_32(<vscale x 32 x i1>* %pa, <vscale x 32 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_32:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,m4,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 32 x i1>, <vscale x 32 x i1>* %pa
				store <vscale x 32 x i1> %a, <vscale x 32 x i1>* %pb
				ret void
				}

				define void @test_load_mask_16(<vscale x 16 x i1>* %pa, <vscale x 16 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_16:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,m2,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 16 x i1>, <vscale x 16 x i1>* %pa
				store <vscale x 16 x i1> %a, <vscale x 16 x i1>* %pb
				ret void
				}

				define void @test_load_mask_8(<vscale x 8 x i1>* %pa, <vscale x 8 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,m1,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 8 x i1>, <vscale x 8 x i1>* %pa
				store <vscale x 8 x i1> %a, <vscale x 8 x i1>* %pb
				ret void
				}

				define void @test_load_mask_4(<vscale x 4 x i1>* %pa, <vscale x 4 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_4:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,mf2,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 4 x i1>, <vscale x 4 x i1>* %pa
				store <vscale x 4 x i1> %a, <vscale x 4 x i1>* %pb
				ret void
				}

				define void @test_load_mask_2(<vscale x 2 x i1>* %pa, <vscale x 2 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,mf4,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 2 x i1>, <vscale x 2 x i1>* %pa
				store <vscale x 2 x i1> %a, <vscale x 2 x i1>* %pb
				ret void
				}

				define void @test_load_mask_1(<vscale x 1 x i1>* %pa, <vscale x 1 x i1>* %pb) {
				; CHECK-LABEL: test_load_mask_1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: vsetvli a2, zero, e8,mf8,ta,mu
				; CHECK-NEXT: vle1.v v25, (a0)
				; CHECK-NEXT: vse1.v v25, (a1)
				; CHECK-NEXT: ret
				%a = load <vscale x 1 x i1>, <vscale x 1 x i1>* %pa
				store <vscale x 1 x i1> %a, <vscale x 1 x i1>* %pb
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Load/Store vector mask types.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 321000

llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td

llvm/test/CodeGen/RISCV/rvv/load-mask.ll

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Load/Store vector mask types.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 321000

llvm/lib/Target/RISCV/RISCVInstrInfoVSDPatterns.td

llvm/test/CodeGen/RISCV/rvv/load-mask.ll

[RISCV] Load/Store vector mask types.
ClosedPublic