This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add a test showing incorrect VSETVLI insertion
ClosedPublic

Authored by frasercrmck on Apr 20 2022, 6:39 AM.

Details

Summary

This test shows incorrect cross-bb insertion. We'd expect to see
a SEW=8 vsetvli, something like:

vsetvli zero, zero, e8, mf8, ta, mu
vluxei64.v      v1, (a2), v8, v0.t

But instead the vsetvli is omitted and instead an inherited SEW=64
vsetvli is used:

    vmv1r.v v9, v1
    vsetvli a3, zero, e64, m1, ta, mu
    vmseq.vi        v9, v1, 0
    vmv1r.v v8, v0
    vmandn.mm       v0, v9, v2
    beqz    a0, .LBB0_2
# %bb.1:
    vluxei64.v      v1, (a2), v8, v0.t
    vmv1r.v v3, v1

The "mask reg op" vmandn.mm in bb.1 appears to be confusing the insertion
process, as it is able to elide its own vsetvli as its VLMAX (SEW=8,
LMUL=MF8) is identical to the previous one (SEW=64, LMUL=1).

Diff Detail

Event Timeline

frasercrmck created this revision.Apr 20 2022, 6:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 20 2022, 6:39 AM
frasercrmck requested review of this revision.Apr 20 2022, 6:39 AM
frasercrmck edited the summary of this revision. (Show Details)Apr 20 2022, 6:41 AM

It appears as if this is what D119518 is trying to achieve, but since we're not inserting any VSETVLIs in bb.1, CurInfo isn't valid and so we skip the insertion. I tried making it insert vsetvlis if CurInfo is invalid (out of safety) but that's perhaps too pessimistic, as then it inserts a VSETVLI at the end of bb.2 as well. We don't have the scope during phase 3 to work out that the emergency VSETVLI inserted at the end of bb.1 would cover that.

Unless we do something naive like that then try and remove them later? We're obviously at the limits of what this pass can do so I don't know if that's just another short-term solution.

rogfer01 added a comment.EditedApr 21 2022, 12:20 AM

One interesting thing is that computeIncomingVLVTYPE doesn't seem to be fully aligned with what emitVSETVLIs will do. If the latter chooses to skip a vsetvl then the Exit of that basic block might potentially be different to the one that we determined in computeVLVTYPEChanges and computeIncomingVLVTYPE.

Maybe aligning computeIncomingVLVTYPE with the expectations of emitVSETVLIs is possible. Looks like once we have computed InInfo in computeIncomingVLVTYPE we may have to run again computeVLVTYPEChanges for that block (and there make sure we use the same skip criteria as in emitVSETVLIs). The latter would now receive the InInfo (in contrast to Phase 1 where it'd be unknown) and it would compute a potentially different Exit value. This also suggests that Phase 1 might be embedded as part of Phase 2 once we have computed the InInfo. This might make the algorithm a bit slower.

D119518 mitigates the lack of alignment by reconciling both but it means that in your case (if we do CurInfo = BlockInfo[MBB.getNumber()].Pred; for the case in which we can skip a vsetvli due to the predecessors) we get an unnecessary change at the end of bb1 which we'd want to have in bb2 instead.

rebase on vtype/sew mir changes

This revision is now accepted and ready to land.Apr 23 2022, 3:27 PM
This revision was landed with ongoing or failed builds.May 4 2022, 6:38 AM
This revision was automatically updated to reflect the committed changes.

One interesting thing is that computeIncomingVLVTYPE doesn't seem to be fully aligned with what emitVSETVLIs will do. If the latter chooses to skip a vsetvl then the Exit of that basic block might potentially be different to the one that we determined in computeVLVTYPEChanges and computeIncomingVLVTYPE.

Maybe aligning computeIncomingVLVTYPE with the expectations of emitVSETVLIs is possible. Looks like once we have computed InInfo in computeIncomingVLVTYPE we may have to run again computeVLVTYPEChanges for that block (and there make sure we use the same skip criteria as in emitVSETVLIs). The latter would now receive the InInfo (in contrast to Phase 1 where it'd be unknown) and it would compute a potentially different Exit value. This also suggests that Phase 1 might be embedded as part of Phase 2 once we have computed the InInfo. This might make the algorithm a bit slower.

D119518 mitigates the lack of alignment by reconciling both but it means that in your case (if we do CurInfo = BlockInfo[MBB.getNumber()].Pred; for the case in which we can skip a vsetvli due to the predecessors) we get an unnecessary change at the end of bb1 which we'd want to have in bb2 instead.

Sorry for not replying earlier @rogfer01, but I wanted to thank you for your thoughts. After a bit of a break, I am about to dig in to see how best to fix it. The fact that the phases see different information is really not ideal.