Use vle1.v/vse1.v to load/store vector mask types.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | How do we ensure that the location we're loading/storing is the right size for this? The mask is (vlenb/8) * 64 * 1 bits. But the load/store size is (vlenb/8)*64*8 bits. |
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | I imagine we could set vl=max(1,vlenb/(8*8)),sew=8 in this case rather than vl=vlmax,sew=8. We still have to load/store at least one i8 (hence the max). This is what already happens when a scalar load/store of i1 appears in IR. However I'm not sure whether this scenario in IR will happen very often. If it doesn't then I imagine the slightly less straightforward code generation may be OK? |
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | I forgot to account lmul, so I think a reasonable vl would be vl=max(1, (vlenb*lmul)/(8*8))) |
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | I configure the load/store using e8,m1. The load/store size is (vlenb/8)*8*8 bits, not (vlenb/8)*64*8. (vlenb/8)*64*8 is e8,m8. You could image that I treat the load/store as load/store <vscale x 8 x i8> type. Why do we need to consider LMUL here? All mask types are stored in one vector registers. All the load/store for mask types should use PseudoVLE#sew#_V_M1/PseudoVSE#sew#_V_M1. We will reserve a whole vector register size in stack for mask types. Use e8,m1 and vl=VLMAX should be able to correctly read out the mask values. |
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | You're right I did do that math wrong. So the vscale x 64 x i1 is ok. But we're using e8,m1 with vlmax for types smaller than vscale x 64 x i1 as well right? When you say "We will reserve a whole vector register size in stack for mask types." You mean for spills and reloads? That's a different case than these IR tests right? |
llvm/test/CodeGen/RISCV/rvv/load-mask.ll | ||
---|---|---|
15 | I think so. For vscale x 32 x i1, we still use e8,m1 with vlmax to read the whole vector out. Yeah, what in my mind is spilling and argument passing through stack. We have not implemented frame handling in the upstream. So, I created the test cases in this way. I will prepare the frame handling for RISC-V V later. |
D93368 will depend on this commit. For example, there are four vector arguments in the masked version of vmseq, i.e., maskedoff, varg0, varg1, mask. When LMUL = 4, we could pass the arguments varg0 and varg1 through vector registers. The first mask type argument will be put in v0. The second mask type argument, i.e., mask, will pass through stack. The address will be stored in the GPR. We need to load the mask value from the stack. Vector argument passing is another issue. We could create another patch for it.
Yes, I mean the test cases for LMUL = 4 and LMUL = 8 will not pass without this commit.
My concern is that I'm not convinced that something like this is correct
%a = alloca <vscale x 32 x i1> %b = store <vscale x 32 x i1> %c, <vscale x 32 x i1>* %a
getTypeSize for that alloca woud return a scalable result with a fixed size of 32. Would we still end up allocating (vlen / 8) * 64 bytes on the stack for that. Or would it be (vlen / 8) * 32 bytes?
We could use the size of one vector register as the unit to allocate stack for RVV objects. Even the size of mask type variable is smaller than one vector register. We could allocate one vector register size for it. It is similar for fractional LMUL variables. There are whole register load/store in v1.0 to access RVV objects in the stack for LMUL = 1, 2, 4, 8.
In our downstream implementation, there is a snippet to calculate how many vectors to reserve in the stack for RVV objects.
int64_t ObjectSize = MFI.getObjectSize(FI); unsigned ShiftAmount; // Mask objects may be logically smaller than the spill size of the VR // class. if (ObjectSize <= TRI->getSpillSize(RISCV::VRRegClass)) ShiftAmount = 0; else if (ObjectSize == TRI->getSpillSize(RISCV::VRM2RegClass)) ShiftAmount = 1; else if (ObjectSize == TRI->getSpillSize(RISCV::VRM4RegClass)) ShiftAmount = 2; else if (ObjectSize == TRI->getSpillSize(RISCV::VRM8RegClass)) ShiftAmount = 3; else llvm_unreachable("Unexpected object size");
To consider the frame handling in D93750, is it reasonable to load/store whole vector registers for mask types regardless which kind of mask types?
How do we ensure that the location we're loading/storing is the right size for this? The mask is (vlenb/8) * 64 * 1 bits. But the load/store size is (vlenb/8)*64*8 bits.