Page MenuHomePhabricator

[ARM] Common inverse constant predicates to VPNOT
ClosedPublic

Authored by dmgreen on Dec 2 2020, 2:11 AM.

Details

Summary

This scans through blocks looking for constants used as predicates in MVE instructions. When two constants are found which are the inverse of one another, the second can be replaced by a VPNOT of the first, potentially allowing that not to be folded away into an else predicate of a vpt block.

Diff Detail

Event Timeline

dmgreen created this revision.Dec 2 2020, 2:11 AM
dmgreen requested review of this revision.Dec 2 2020, 2:11 AM

Should there be a range limit somewhere in this logic, beyond just 'in the same basic block'? I worry slightly that if there's a very long basic block with two completely separate predicated sections, and they happen to reuse the same constant, then it might not be a win to do this change. Something along the lines of

%mask1 = [complicated constant]
%mask2 = [exactly the inverse constant]
[do a thing predicated on %mask1]
[do a thing predicated on %mask2]
[10 instructions of unrelated code not VPT-predicated at all]
%mask3 = [same constant as %mask2]
[do a thing predicated on %mask3]

In that situation, it's fine to turn the use of %mask2 into a VPNOT of %mask1, so that the first VPT block has a T and then an E instruction. But you'd probably prefer that %mask3 was rematerialized from scratch rather than (as I understand that this code would do) preserving %mask1 from ages ago in order to complement it again.

dmgreen updated this revision to Diff 310116.Dec 8 2020, 2:19 AM

Our VPT block creation certainly isn't optimal in a lot of cases, that is very true. I think in general a VPNOT is going to be better or equal to a MOV;VMSR pair, so long ranges or intervening instructions are probably OK. If there is a different mask used between the two though, that case would be better to not try and use VPNOTs of previous values that just need to be spilled and reloaded.

I have made this only track a single predicate at a time, invalidating it as we scan through and reusing it or a VPNOT of it as we can.

My original motivating case for this isn't really a lot better - unfortunately scheduling comes along ruins everything. I may have to add something extra to sort that out too. The test cases here (along with the newly added ones) look OK though, baring one that has some strange register allocation.

simon_tatham accepted this revision.Dec 8 2020, 2:45 AM

That seems like a reasonable precaution to me, yes – with only one VPR, there's only one previous value that it might be of obviously low cost to VPNOT.

This revision is now accepted and ready to land.Dec 8 2020, 2:45 AM
This revision was automatically updated to reflect the committed changes.