This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use NoReg in place of IMPLICIT_DEF for undefined passthru operands
ClosedPublic

Authored by reames on Aug 2 2023, 8:53 AM.

Details

Summary

In a recent series of refactorings (described here: https://discourse.llvm.org/t/riscv-transition-in-vector-pseudo-structure-policy-variants/71295), I greatly increased the number of IMPLICIT_DEF operands to our vector instructions. This has turned out to have an unexpected negative impact because MachineCSE does not CSE IMPLICIT_DEFs, and thus does not CSE any instruction with an IMPLICIT_DEF operand. SelectionDAG *does* CSE the same case, but that only covers the same block case, not the cross block case. This lead to the performance regression reported in https://github.com/llvm/llvm-project/issues/64282.

This change is a slightly ugly hack to side step the issue. Instead of fixing the root cause (lack of CSE for IMPLICIT_DEF) or undoing the operand changes, we leave the extra operand in place, and use NoReg in place of IMPLICIT_DEF. I then convert back to IMPLICIT_DEF just before register allocation so that ProcessImplicitDefs and TwoAddressInstructions can do the normal transforms to Undef tied registers.

We may end up backporting this into the 17.x release branch.

Reviewers, please pay close attention to the test diffs. These are mostly schedule perturbations caused by the removal of the IMPLICIT_DEF instruction - basically undoing the changes from the refactoring sequence - but there could be other issues lurking within. I already found two correctness issues with earlier versions of the patch, and may have missed others.

Diff Detail

Event Timeline

reames created this revision.Aug 2 2023, 8:53 AM
reames requested review of this revision.Aug 2 2023, 8:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2023, 8:53 AM
reames edited the summary of this revision. (Show Details)Aug 2 2023, 8:54 AM
craig.topper added inline comments.Aug 2 2023, 10:37 AM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
3245

Stray change

3606

intend -> intent?

llvm/lib/Target/RISCV/RISCVInstrInfoVPseudos.td
57

thought -> though?

reames updated this revision to Diff 546619.Aug 2 2023, 2:58 PM

Address review comments

asb added a comment.EditedAug 9 2023, 6:22 AM

Some high level review comments:

  • The intended change and the problem being solved is very clear from the patch description and the bug description - thanks!
  • One minor request is that for the code comments, any time you refer to e.g. a "MachineCSE problem" (as in the comment in RISCVRVVInitUndef.cpp) please be more explicit that this is an issue related to the quality of generated code rather than correctness. Just referencing the bug number would be fine. This will be helpful for future code readers if this workaround lives longer than expected.
  • The approach implemented here seems sensible + pragmatic to me. Though my confidence of spotting a lurking subtle issue here is low - so it will be great to get input from other reivewers too.
  • Given we don't have concrete examples of this causing _big_ problems I think it wouldn't be the end of the world if this wasn't in 17.0.0, but I see the argument for backporting it and overall am probably slightly in favour. If it's going to happen, I think this needs reviewing and landing soon so there's at least a full release candidate cycle including it, giving a chance to back it out if unexpected problems are found.

Summary: I don't spot any issues and this seems sensible to me, but I think other reviewers on the list who work on vector codegen more are better placed to do a detailed line-by-line review.

luke added inline comments.Aug 9 2023, 7:36 AM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll
440

Possible regression here. Are the NoRegs coming into the InsertVSETVLI pass?

reames added inline comments.Aug 9 2023, 7:02 PM
llvm/test/CodeGen/RISCV/rvv/fixed-vectors-bswap-vp.ll
440

This turns out to be a known issue showing up in a slightly surprising way.

The cause is that InsertVSETVLI contains a peephole for VL=1 splats with tail undefined which is only applied in the forward direction - due to staleness issues.

The scheduling changes causing the two vmv.v.i instructions to be reordered. So previously the VL=4 e32 comes before the VL=1 e8. Afterward, the inverse order reaches InsertVSETVLI and the forward rule doesn't trigger.

The fix here would be to always apply peepholes in both directions, but that requires careful reasoning about working with stale MI. I'll think about that separately, but do not consider this a blocker for this patch.

reames updated this revision to Diff 549977.Aug 14 2023, 8:53 AM

Update comments as requested by @asb. Sorry for the delay, I'd missed that request in your comment.

At this point, I do not plan to request a backport for this change. I'd still like to get it landed, but it's too high risk to backport late in a release cycle, and at the moment, none of my performance data indicates a strong need to do so.

craig.topper added inline comments.Aug 14 2023, 9:38 AM
llvm/test/CodeGen/RISCV/rvv/cross-block-cse.ll
3–4

This TODO needs to be dropped

reames updated this revision to Diff 549998.Aug 14 2023, 9:41 AM

Address comment from @craig.topper

craig.topper added inline comments.Aug 14 2023, 9:48 AM
llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
2787

I think I'm missing how this change comes from this patch. The sltu+addi+and looks like an optimized select lowering. The old code use a branch for the select. It looks to me like all of the changes for this patch should be after select lowering.

craig.topper added inline comments.Aug 14 2023, 10:01 AM
llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
2787

I guess its just a scheduling difference due to the splitting. The select is further down on this side and the sltu is further down on the other side.

reames added inline comments.Aug 14 2023, 10:04 AM
llvm/test/CodeGen/RISCV/rvv/ctpop-vp.ll
2787

That is a very good question!

However, I think the answer is that it isn't actually select lowering, it's scheduling run slightly amuck. If you look quite a bit further down the test diff, you see what appears to be a select-to-branch conversion going in the other direction.

I think what we actually have here is a massive rescheduling. However, that would seem to require cross block scheduling, which too my knowledge, we don't have? So maybe I'm still confused?

2897–2900

Here

This revision is now accepted and ready to land.Aug 14 2023, 11:02 AM
This revision was landed with ongoing or failed builds.Aug 14 2023, 1:01 PM
This revision was automatically updated to reflect the committed changes.