This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Fix VSETVLI insertion by syncing phases 2 and 3
AbandonedPublic

Authored by frasercrmck on May 5 2022, 9:52 AM.

Details

Summary

This patch attempts to make the VSETVLI insertion pass more stable. A
lot of bugs are ultimately introduced due to phases being out of sync.

This is most notable when we skip VSETVLI insertion in phase 3 by re-using a
previous config, but still rely on phase 2 information which was
calculated without full knowledge of what we're going to skip.

The idea used in this patch is something @rogfer01 suggested in D124089.
By ensuring that VSETVLI "skips" are calculated in one re-usable place
(a new overload of needVSETVLI) we can use it during phases 1 and 2 to
produce more accurate information. In addition, during phase 2, we now
re-compute block-local VTYPE changes once predecessors have been
computed.

Diff Detail

Event Timeline

frasercrmck created this revision.May 5 2022, 9:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 9:52 AM
frasercrmck requested review of this revision.May 5 2022, 9:52 AM

adjust comments

Have you tried adding asserts to the end of emitVSETVLIs to ensure the final block state is as expected? I strongly suspect this change is insufficient to fix all the cases.

I've been working on the same problem, but with a different approach. I've been trying to separate the PHI case and the mutation case into a separate phase 4 which works over the fully materialized form. I've got all the transforms (messily) implemented, but even with those out of line, I'm still getting failures of the expected block end state assert. I don't yet know why.

One really nasty case I came across - that I don't think your change handles - is that mutation of the prior vsetvli can change whether the current state is compatible with a following instruction. The compatibility logic includes a bit which looks back at the defining vsetvli for a vreg operand, and since we mutated it, the result can change between phase 1 and phase 3. At the moment, I'm thinking we can't have both that compatibility rule and the mutation in use in the same pass.

Now despite all that, I think your changes here really help in terms of code structure.

llvm/lib/Target/RISCV/RISCVInsertVSETVLI.cpp
1050

This bit on it's own is super important. It's slower, but it avoids the whole need for the merge and Change state entirely.

I think this is a good idea, and probably stands on it's own.

Here (https://reviews.llvm.org/D125035) is the assertion I mentioned above as a flag we can use for testing.

Have you tried adding asserts to the end of emitVSETVLIs to ensure the final block state is as expected? I strongly suspect this change is insufficient to fix all the cases.

It's certainly not enough to fix all cases, as I'm seeing new regressions when testing this internally. Looks like we might be missing coverage on loops. I'll get to work on reducing those failures and committing them. Your asserts might help with that.

I've been working on the same problem, but with a different approach. I've been trying to separate the PHI case and the mutation case into a separate phase 4 which works over the fully materialized form. I've got all the transforms (messily) implemented, but even with those out of line, I'm still getting failures of the expected block end state assert. I don't yet know why.

One really nasty case I came across - that I don't think your change handles - is that mutation of the prior vsetvli can change whether the current state is compatible with a following instruction. The compatibility logic includes a bit which looks back at the defining vsetvli for a vreg operand, and since we mutated it, the result can change between phase 1 and phase 3. At the moment, I'm thinking we can't have both that compatibility rule and the mutation in use in the same pass.

Yeah I only really clocked this mutation thing yesterday. I thought that by including the "mutation" in needVSETVLI (now in phases 1 and 2) we might be able to get away with it being synced up?

Now despite all that, I think your changes here really help in terms of code structure.

Thanks. I do think having the one method to decide the compatibility and using it consistently across phases is logical and less error-prone. Unfortunately it seems we're not able to make what should be an NFC change without breaking things.

rebase on new test which now fails

frasercrmck added inline comments.May 6 2022, 4:31 AM
llvm/test/CodeGen/RISCV/rvv/vsetvli-insert-crossbb.mir
824

This is a bug, but I'm not yet sure if it's dormant and exposed by this patch or actually caused by it.

remove need for TmpStatus (and BBInfo.Change, really)

pre-emptively include D125133

frasercrmck abandoned this revision.May 26 2022, 7:13 AM

This is all largely irrelevant now. I'll open a new revision for the API refactor.