This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Allow tail predication of VLDn
ClosedPublic

Authored by dmgreen on Aug 15 2020, 10:55 AM.

Details

Summary

VLD2 instructions cannot be predicated, so we cannot tail predicate them from autovec. From intrinsics though, they should be valid as they will just end up loading extra values into off vector lanes, not effecting the on lanes. The same is true for loads in general where so long as we are not using the other vector lanes, an unpredicated load can be converted to a predicated one.

This marks VLD2 and VLD4 instructions as validForTailPredication and allows any unpredicated load in tail predication loop, which I believe will be valid given the other checks we have (but may be mistaken. Some checks I could think of appear to be missing (to do with moving between vector lanes), but that seems to be an orthogonal issue).

Diff Detail

Event Timeline

dmgreen created this revision.Aug 15 2020, 10:55 AM
dmgreen requested review of this revision.Aug 15 2020, 10:55 AM

This has made me think a bit.. but it seems to make sense and from looking again at the existing checks, they look enough for correctness - to my pleasant surprise!
It feels odd though, that these instructions can't be used in a VPT block but we can use them here... I guess because they actually use mechanisms. But have we got confirmation that both/all registers are always guaranteed to be updated properly in a tail-predicated loop? I don't think there's anything in the reference manual about that limitation but I also haven't found it very clear anyway! One case which we'd have to avoid is if, in a horrible circumstance, LR is used as an address register which is post-indexed and predication could become UNKNOWN, but that isn't specific to VLD2/4.

This has made me think a bit.. but it seems to make sense and from looking again at the existing checks, they look enough for correctness - to my pleasant surprise!
It feels odd though, that these instructions can't be used in a VPT block but we can use them here... I guess because they actually use mechanisms. But have we got confirmation that both/all registers are always guaranteed to be updated properly in a tail-predicated loop? I don't think there's anything in the reference manual about that limitation but I also haven't found it very clear anyway!

Yeah. The architectural manual doesn't use the predicate in the VLD2/4 code. So they can't be predicated and always just load the whole vector width of values. This is OK so long as the original code said it was safe to load from that many memory locations, and as this is from intrinsics (not autovec) that seems valid. I even tested it. And for normal loads it's OK to load less values if they are not going to be used.

One case which we'd have to avoid is if, in a horrible circumstance, LR is used as an address register which is post-indexed and predication could become UNKNOWN, but that isn't specific to VLD2/4.

Do you mean:

The behavior of a beat-wise capable instruction that modifies LR and is within a tail predicated low overhead loop
is CONSTRAINED UNPREDICTABLE, the permitted behaviors are either of

That's a new one to me.

samparker accepted this revision.Aug 18 2020, 8:16 AM

Yeah. The architectural manual doesn't use the predicate in the VLD2/4 code. So they can't be predicated and always just load the whole vector width of values.

Okay.

The behavior of a beat-wise capable instruction that modifies LR and is within a tail predicated low overhead loop is CONSTRAINED UNPREDICTABLE

Yeah, me too... but not your problem here. Thanks.

This revision is now accepted and ready to land.Aug 18 2020, 8:16 AM
This revision was automatically updated to reflect the committed changes.