This patch adds a feature to the MVE VPT block insertion patch: It can now remove VPNOT instructions in some circumstances in order to place the instructions predicated by the VPNOT in an else block instead; This results in more compact code.
For example, this pass used to generate this kind of assembly:
vldrw.u32 q1, [r5] vpt.s32 ge, q1, r2 ; Added by MVE-VPT block insertion pass vcmpt.s32 le, q1, r3 vpnot vpst ; Added by MVE-VPT block insertion pass vstrwt.32 q0, [r5], #16
Now, when the VPNOT's result (stored in VPR) is not needed, and when the above block has enough room, it'll generate this instead:
vldrw.u32 q1, [r5] vpte.s32 ge, q1, r2 ; Added by MVE-VPT block insertion pass - Notice the "te" instead of "t" vcmpt.s32 le, q1, r3 vstrwe.32 q0, [r5], #16 ; "t" changed to "e", indicating that this instruction is now part of the "else"
This is much shorter and should be more efficient, and the pass will only remove the VPNOT when VPR is used+killed by one instruction or when VPR is written to in its block.
This should be enough to avoid losing the result of a VPNOT if it's needed, however I'm open to suggestions as I don't actually know if it's good enough.
Is this only called with a Count of 1 now? If so, can it be simplified.