This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Load/Store vector mask types.
ClosedPublic

Authored by HsiangKai on Dec 15 2020, 8:48 PM.

Details

Summary

Use vle1.v/vse1.v to load/store vector mask types.

Diff Detail

Event Timeline

HsiangKai created this revision.Dec 15 2020, 8:48 PM
HsiangKai requested review of this revision.Dec 15 2020, 8:48 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 15 2020, 8:48 PM
Herald added a subscriber: MaskRay. · View Herald Transcript
HsiangKai updated this revision to Diff 312095.Dec 15 2020, 8:52 PM
craig.topper added inline comments.Dec 16 2020, 11:49 AM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

How do we ensure that the location we're loading/storing is the right size for this? The mask is (vlenb/8) * 64 * 1 bits. But the load/store size is (vlenb/8)*64*8 bits.

rogfer01 added inline comments.Dec 16 2020, 2:15 PM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

I imagine we could set vl=max(1,vlenb/(8*8)),sew=8 in this case rather than vl=vlmax,sew=8. We still have to load/store at least one i8 (hence the max). This is what already happens when a scalar load/store of i1 appears in IR.

However I'm not sure whether this scenario in IR will happen very often. If it doesn't then I imagine the slightly less straightforward code generation may be OK?

rogfer01 added inline comments.Dec 16 2020, 2:17 PM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

I forgot to account lmul, so I think a reasonable vl would be vl=max(1, (vlenb*lmul)/(8*8)))

HsiangKai added inline comments.Dec 16 2020, 10:25 PM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

I configure the load/store using e8,m1. The load/store size is (vlenb/8)*8*8 bits, not (vlenb/8)*64*8. (vlenb/8)*64*8 is e8,m8. You could image that I treat the load/store as load/store <vscale x 8 x i8> type.

Why do we need to consider LMUL here? All mask types are stored in one vector registers. All the load/store for mask types should use PseudoVLE#sew#_V_M1/PseudoVSE#sew#_V_M1.

We will reserve a whole vector register size in stack for mask types. Use e8,m1 and vl=VLMAX should be able to correctly read out the mask values.

craig.topper added inline comments.Dec 16 2020, 10:35 PM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

You're right I did do that math wrong. So the vscale x 64 x i1 is ok. But we're using e8,m1 with vlmax for types smaller than vscale x 64 x i1 as well right?

When you say "We will reserve a whole vector register size in stack for mask types." You mean for spills and reloads? That's a different case than these IR tests right?

HsiangKai added inline comments.Dec 16 2020, 11:04 PM
llvm/test/CodeGen/RISCV/rvv/load-mask.ll
15

I think so. For vscale x 32 x i1, we still use e8,m1 with vlmax to read the whole vector out.

Yeah, what in my mind is spilling and argument passing through stack. We have not implemented frame handling in the upstream. So, I created the test cases in this way. I will prepare the frame handling for RISC-V V later.

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

My concern is that I'm not convinced that something like this is correct

%a = alloca <vscale x 32 x i1>
%b = store <vscale x 32 x i1> %c, <vscale x 32 x i1>* %a

getTypeSize for that alloca woud return a scalable result with a fixed size of 32. Would we still end up allocating (vlen / 8) * 64 bytes on the stack for that. Or would it be (vlen / 8) * 32 bytes?

D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.

Are you saying the tests in D93368 don't pass without this commit?

Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.

My concern is that I'm not convinced that something like this is correct

%a = alloca <vscale x 32 x i1>
%b = store <vscale x 32 x i1> %c, <vscale x 32 x i1>* %a

getTypeSize for that alloca woud return a scalable result with a fixed size of 32. Would we still end up allocating (vlen / 8) * 64 bytes on the stack for that. Or would it be (vlen / 8) * 32 bytes?

We could use the size of one vector register as the unit to allocate stack for RVV objects. Even the size of mask type variable is smaller than one vector register. We could allocate one vector register size for it. It is similar for fractional LMUL variables. There are whole register load/store in v1.0 to access RVV objects in the stack for LMUL = 1, 2, 4, 8.

In our downstream implementation, there is a snippet to calculate how many vectors to reserve in the stack for RVV objects.

int64_t ObjectSize = MFI.getObjectSize(FI);

unsigned ShiftAmount;
// Mask objects may be logically smaller than the spill size of the VR
// class.
if (ObjectSize <= TRI->getSpillSize(RISCV::VRRegClass))
  ShiftAmount = 0;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM2RegClass))
  ShiftAmount = 1;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM4RegClass))
  ShiftAmount = 2;
else if (ObjectSize == TRI->getSpillSize(RISCV::VRM8RegClass))
  ShiftAmount = 3;
else
  llvm_unreachable("Unexpected object size");
HsiangKai updated this revision to Diff 313818.Dec 27 2020, 8:07 PM

Rebase on master.

To consider the frame handling in D93750, is it reasonable to load/store whole vector registers for mask types regardless which kind of mask types?

HsiangKai updated this revision to Diff 320628.Feb 1 2021, 3:45 PM

Use vle1.v/vse1.v to load/store mask types.

frasercrmck accepted this revision.Feb 2 2021, 1:47 AM

LGTM. There will be a conflict with D95844 so be careful.

This revision is now accepted and ready to land.Feb 2 2021, 1:47 AM

This patch uses vle1.v/vse1.v. It depends on D95781. I will commit it after D95781 is accepted.

This revision was landed with ongoing or failed builds.Feb 2 2021, 9:44 PM
This revision was automatically updated to reflect the committed changes.