This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Teach vsetvli insertion to handle PseudoReadVL.
AbandonedPublic

Authored by fakepaper56 on Apr 12 2022, 2:55 AM.

Details

Summary

VSETVLIInfos right after VLEFF/VLSEGFF are unkown since they modify VL. Unknown
VSETVLIInfos make next vector operations needed to be inserted VSET(I)VLI.

Take the below C code as an example,

vint8m4_t vec_src1 = vle8ff_v_i8m4(str1, &new_vl, vl);
vbool2_t mask1 = vmseq_vx_i8m4_b2(vec_src1, 0, new_vl);

Vsetvli insertion adds a redundant vsetvli for that,

vsetvli a2,a2,e8,m4,ta,mu
vle8ff.v v28,(a0)
csrr a3,vl ; redundant
vsetvli zero,a3,e8,m4,ta,mu ; redundant
vmseq.vi v25,v28,0

The patch uses PseudoReadVL to get the VSETVLIInfos at the location. It may
prevent vsetvli insertion from inserting redundant VSET(I)VLI.

Diff Detail

Event Timeline

fakepaper56 created this revision.Apr 12 2022, 2:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2022, 2:55 AM
fakepaper56 requested review of this revision.Apr 12 2022, 2:55 AM
jacquesguan added inline comments.Apr 12 2022, 6:00 AM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
57

Maybe no need to save a pointer in the VSETVLIInfo? I think that we only care about if the previous VSETVLIInfo is from a VL modified MI and the current AVL is read from the result of the modified VL. I think maybe we could use a local pointer in emitVSETVLIs or somewhere you would like to use it.

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

fakepaper56 added inline comments.Apr 12 2022, 6:50 PM
llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
57

The patch also supports emitVSETVLIPHIs which detects the VSETVLIInfo of predecessors. I think it is hard to introduce "previous" VSETVLIInfo into multiple predecessors.

fakepaper56 added a comment.EditedApr 12 2022, 8:32 PM

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

I think we could remove function collectVLCopy() and the table RegToVLModifer if we change the output of VLEFF.

By the way, I think we could just expanding PseudoReadVL in the pass RISCVInsertVSETVLI instead after-RA pass, since scheduling may decrease the live range of the output of PseudoReadVL.

fakepaper56 added a comment.EditedApr 29 2022, 6:30 AM

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

Do you think should we change the output of VLEFF and VLSEGFF? I had tried to do that in my local, but I am confused that changing the output seems only benefit in the pass and we need to add more code in tablegen files and RISCVISelDAGToDAG.* than decreased code in the patch.

Would it simplify things if the VLEFF pseudo instruction had the GPR output and the vector register output. And we expanded it PseudoReadVL after register allocation?

Do you think should we change the output of VLEFF and VLSEGFF? I had tried to do that in my local, but I am confused that changing the output seems only benefit in the pass and we need to add more code in tablegen files and RISCVISelDAGToDAG.* than decreased code in the patch.

The InsertVSETVLI pass is complex and buggy. We've had 3 bugs in it in 2022. I want to reduce complexity here if possible. Would adding SEW and LMUL from the VLEFF to the PseudoReadVL that gets emitted for VLEFF help without requiring us to give the VLEFF the GPR output?

fakepaper56 added a comment.EditedMay 1 2022, 12:36 AM

Would adding SEW and LMUL from the VLEFF to the PseudoReadVL that gets emitted for VLEFF help without requiring us to give the VLEFF the GPR output?

I think it is a better solution, we could just check vtype of PseudoReadVL for PseudoReadVL's user and we even don't need the new status VLModified.

fakepaper56 added a comment.EditedMay 1 2022, 6:38 AM

Should I create a new revision for changing PseudoReadVL? Or I just do the change in the revision and also update the commit name and summary?

Should I create a new revision for changing PseudoReadVL? Or I just do the change in the revision and also update the commit name and summary?

Yes, create a new revision. Then we can rebase this patch on top of that new revision.

khchen added a subscriber: khchen.May 3 2022, 12:12 AM

Directly use PseudoReadVL to get VSETVLIInfo also change the commit name and summary.
Additionally, the update removes useless PseudoReadVL in the pass.

fakepaper56 retitled this revision from [RISCV] Teach vsetvli insertion to handle VSETVLIInfo of vl-modified instruction. to [RISCV] Teach vsetvli insertion to handle PseudoReadVL..May 14 2022, 1:06 AM
fakepaper56 edited the summary of this revision. (Show Details)
fakepaper56 edited the summary of this revision. (Show Details)

Coming into this a bit late, but been spending a decent amount of time looking at related code recently.

I am generally not a fan of this patch. The entire PsuedoReadVL mechanism feels like a hack, and the unchecked assumption that the SEW and policy bits on the ReadVL match the prior VLE is worrying. Beyond that, this adds a decent amount of complexity.

I find myself agreeing with @craig.topper's comment above. It really feels like we need a pseudo instruction for VLEnFF itself here. Such a psuedo would produce two outputs, the second being the GPR version of VL. After this pass, we could lower the pseudo into two instructions (e.g. the actual VLEnFF and the CSR read). This would make the state update on this patch very minimal. This really feels to me like the "right" approach here.

An alternative you could pursue is to approximate the simplicity of matching of the pseudo with a local peephole on top of the current split structure. I mocked up a POC of this idea (https://reviews.llvm.org/D126227). It needs some cleanup, but its less invasive/risky than this is. Note that the test diff shows a slightly different impact of this patch. I'm not sure which is correct.

I am generally not a fan of this patch. The entire PsuedoReadVL mechanism feels like a hack, and the unchecked assumption that the SEW and policy bits on the ReadVL match the prior VLE is worrying. Beyond that, this adds a decent amount of complexity.

I find myself agreeing with @craig.topper's comment above. It really feels like we need a pseudo instruction for VLEnFF itself here. Such a psuedo would produce two outputs, the second being the GPR version of VL. After this pass, we could lower the pseudo into two instructions (e.g. the actual VLEnFF and the CSR read). This would make the state update on this patch very minimal. This really feels to me like the "right" approach here.

I agree VTYPE of VLEnFF calculated in two different places is worrying. I am sorry that I didn't think deeply about the two approaches and just thought that modifying PseudoReadVL need less code.

fakepaper56 abandoned this revision.Jun 14 2022, 11:01 PM

The function is done by D127576.