With predicated false lanes now tracked to guarantee zeros in the false bytes, we can allow vmaxv.u* and vmaxav instructions to be tail predicated if the argument requirements are met.
Details
Diff Detail
Event Timeline
The unit test failure looks genuine, does that needs fixing?
Ah, I haven't updated the unit test in D76708.
For min/max, we can't support an implicit vmin because the results may not be the same after the conversion. So, say we only have three 32-bit elements left to process (and the fourth element is the LHS 0x00):
opcode | input | original result | tail predicated result |
VMAXV.u32 | 0x00010203 | 0x03 | 0x03 |
VMINV.u32 | 0x00010203 | 0x00 | 0x01 |
The tail predicated instruction will ignore the predicated lanes/bytes, whereas the original doesn't.
We're also only supporting unsigned values because we know that the 'FalseLaneZeros' can't interfere with the result, because the zero will only be the answer if the rest of the elements are also zero. This is not true for signed values though, where the false zero may be the largest value.
Thanks for that example! I asked this question because I expected the vmax and vmin to behave roughly the same. In your example, if you change the input and example from 0x00010203 to 0x04010203, then the VMAX will also give a different result after tail-predication, or am I still missing something?
if you change the input and example from 0x00010203 to 0x04010203, then the VMAX will also give a different result after tail-predication
Indeed! Which is why we track for our zero'd false lanes. With vmax being a horizontal operations, we check that it operates upon on registers that we know have zero'd false lanes.
same for these 3 (why not suitable for TP?)