This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block.
Details
Diff Detail
Event Timeline
Should there be a range limit somewhere in this logic, beyond just 'in the same basic block'? I worry slightly that if there's a very long basic block with two completely separate predicated sections, and they happen to reuse the same constant, then it might not be a win to do this change. Something along the lines of
%mask1 = [complicated constant] %mask2 = [exactly the inverse constant] [do a thing predicated on %mask1] [do a thing predicated on %mask2] [10 instructions of unrelated code not VPT-predicated at all] %mask3 = [same constant as %mask2] [do a thing predicated on %mask3]
In that situation, it's fine to turn the use of %mask2 into a VPNOT of %mask1, so that the first VPT block has a T and then an E instruction. But you'd probably prefer that %mask3 was rematerialized from scratch rather than (as I understand that this code would do) preserving %mask1 from ages ago in order to complement it again.
Our VPT block creation certainly isn't optimal in a lot of cases, that is very true. I think in general a VPNOT is going to be better or equal to a MOV;VMSR pair, so long ranges or intervening instructions are probably OK. If there is a different mask used between the two though, that case would be better to not try and use VPNOTs of previous values that just need to be spilled and reloaded.
I have made this only track a single predicate at a time, invalidating it as we scan through and reusing it or a VPNOT of it as we can.
My original motivating case for this isn't really a lot better - unfortunately scheduling comes along ruins everything. I may have to add something extra to sort that out too. The test cases here (along with the newly added ones) look OK though, baring one that has some strange register allocation.
That seems like a reasonable precaution to me, yes – with only one VPR, there's only one previous value that it might be of obviously low cost to VPNOT.