This patch combines a VCMP followed by a VPST into a VPT, which has the same semantics as the combination of the former two.
Shouldn't you be searching for any VCMP opcode? RDA would be a nicer way of finding the VPR def, but that shouldn't be unnecessary anyway - I'm pretty certain the VCMP should be the 'Divergent' instruction.
Probably best not to run at -O3, just in case upstream/downstream have different optimisation pipelines.
Checking the Divergent is much better, thanks. I also realised that I wasn't decrementing I if the ++I == E check failed so this new logic is more robust.
The VPT block pass has some very similar code. Do we need to check that Operand 1 and Operand 2 have not been modified between the VCMP and where we are materializing the VPT to?