This is an archive of the discontinued LLVM Phabricator instance.

[ARM][LowOverheadLoops] Allow all MVE instrs.
ClosedPublic

Authored by samparker on Jan 10 2020, 7:44 AM.

Details

Summary

We have a whitelist of instructions that we allow when tail predicating, since these are trivial ones that we've deemed need no special handling. Now change ARMLowOverheadLoops to allow the non-trivial instructions if they're contained within a valid VPT block. Since a valid block is one that is predicated upon the VCTP so we know that these non-trivial instructions will still behave as expected once the implicit predication is used instead.

Diff Detail

Event Timeline

samparker created this revision.Jan 10 2020, 7:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2020, 7:44 AM
SjoerdMeijer accepted this revision.Jan 13 2020, 5:34 AM

LGTM, with just a nit inlined.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
479

nit: how about RecordAllowedMVEInsts to do a bit more justice what this function is doing?

This revision is now accepted and ready to land.Jan 13 2020, 5:34 AM

This looks sensible, using the VPT blocks to know that the instructions are predicated on something that makes tail predication valid.

I had couple of questions about some of the existing details. The VPSEL question is the only one really relevant here, the others are probably best looked at elsewhere (or they may be handled already in here somewhere, or I may just be thinking about them wrongly). They shouldn't block this improvement.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
525

What if we had something like a VMOV s0 s1. I know that's not anything to do with this patch, but is that outlawed anywhere at the moment?

530

What about VPNOT or VPSEL? They would use both the vpr and the tail predicate to different extents.

VPSEL we seem to mark as IsValidForTailPredication, which I'm not sure about. We need to make sure it's using the "same" predicate as before (and making sure it doesn't just use whatever we had last put there, if we delete the vcpt!)

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmldava_in_vpt.mir
174

Another unrelated one, but there is nothing in the instructions of this loop that says that this is predicated on a hardware register, right? Or that is has unmodelled side effects? The refs to vpr and the vctp are removed but no other ref is added.

samparker marked 3 inline comments as done.Jan 13 2020, 6:59 AM
samparker added inline comments.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
525

It's not outlawed., why do we need to be concerned about it?

530

Good point. Yes, it looks like the problem is that we're only recording VPT blocks whereas, as you say, there are other instructions that also need handling. It looks like VPSEL/VPNOT would currently stuffed into the previous VPT block, even if they're not in one. This will mean that the predicates are checked for correctness, but it also could cause an assert too! I'll need to put up a different patch for that change.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmldava_in_vpt.mir
174

Yes, at this point it's completely implicit, which I don't think is a problem currently. Maybe we could add a new register and a predicate def for LSTP and LETP, but I have no idea how invasive that would be for the predicate users.

dmgreen added inline comments.Jan 13 2020, 7:18 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
525

If we did VMOV s7, s0; VMOV s6, s1; VMOV s5, s2; VMOV s4, s3, that would be a reverse shuffle of a i32 vector. I would presume that would be trouble for tail prediction as it can no longer really be sure about what lanes are and are not predicated.

530

Sounds good. I think they are both currently outside of IT blocks. We should be folding VPNOT into blocks where we can (creating else's), but are not yet. The same thing could be done for VPSEL in the future, turning it into a VMOVt.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/vmldava_in_vpt.mir
174

I may be possible to add something similar to VPR, to the MVE_P base of all MVE instructions. Might be some differences in register orders though. Like you said, this is very late so might not be a problem currently. It's best not to be implicit if we can help it though.

dmgreen accepted this revision.Jan 14 2020, 12:17 AM

LGTM by the way. I hadn't meant to tread on Sjoerd toes, I didn't see he was taking a look already. The issues were mostly unrelated, this looks like a nice improvement on it's own.

samparker marked an inline comment as done.Jan 14 2020, 1:45 AM
samparker added inline comments.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
525

The conversion has to be about predication conversion, just making sure that it is equivalent, so these shouldn't be a problem. What needs to happen though is checking for values live in and out.... and then checking whether these scalar regs are aliasing the Q regs. I'm currently working on this.

This revision was automatically updated to reflect the committed changes.