Our current implementation of the InsertVSETVLI dataflow allows phase 3 to arrive at a different block end state than the data flow in phase 1/2 computed. This arises because a block which contains instructions (e.g. load or stores) which don't consume all the incoming bits of the VL/VTYPE can be compatible with multiple incoming states. The algorithm effectively changes the SEW on such instructions, and propagates the prior state forward. As phase 3 uses the block input state for this propagation, but phase 1/2 doesn't, this can result in different block end states.
If we don't correct for it, this discrepancy can result in miscompiles. This was the source of multiple recent changes. However, by now we have fixes for all known correctness issues.
The basic strategy we use is to insert a compensation vsetvli to bring the block state leaving the block back into consistency with the one computed. This is correct, but results in extra vsetvlis being placed at the end of blocks.
This change adjusts the phase 1/2 algorithm to propagate the incoming block state through the block, allowing the compatibility rules to modify the end state. The algorithm may need to run slightly more iterations, but the end result is consistent with what phase 3 does.
The benefit of doing this is two fold.
First, we reverse some of the code quality introductions introduced in the functional fixes.
Second, we simplify the invariants, and allow the strict assertions to be enabled. Several humans, myself included, have found it quite surprising that invariant didn't hold already, and arguably that confusion is the cause of several of our recent miscompiles in this code.
The downside to this patch is that the dataflow may require additional iterations to stabilize. In the worse case, we go from O(Edges) to O(E + UniquePaths) as the incoming state (and thus the outgoing one) can now change once for each path from the entry block.
A couple notes to reviewers:
- The majority of the code improvements are simply reversing the regressions from D125703.
- A prior version of this patch switched from optimistic to conservative data flow. I have dropped that entirely, and don't plan to revisit. It turned out my doing so was accidentally fixing the issue in D125703, and I had not realized that.
This is the last use of /*Strict*/ true - can we follow up on this and get rid of Strict entirely, which was always a patch job?