This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Introduce a MQPRCopy
ClosedPublic

Authored by dmgreen on Oct 4 2021, 3:58 AM.

Details

Summary

Currently when creating tail predicated loops, we need to validate that all the live-outs of a loop will be equivalent with and without tail predication, and if they are not we cannot legally create a tail-predicated loop, leaving expensive vctp and vpst instructions in the loop. These notably can include register-allocation instructions like stack loads and stores, and copys lowered from COPYs to MVE_VORRs.

Instead of trying to prove this is valid late in the pipeline, this patch introduces a MQPRCopy pseudo instruction that COPY is lowered to. This can then either be converted to a MVE_VORR where possible, or to a couple of VMOVD instructions if not. This way they do not behave differently within and outside of tail-predications regions, and we can know by construction that they are always valid. The idea is that we can do the same with stack load and stores, converting them to VLDR/VSTR or VLDM/VSTM where required to prove tail predication is always valid.

This does unfortunately mean inserting multiple VMOVD instructions, instead of a single MVE_VORR, but my experiments show it to be an improvement in general.

Diff Detail

Event Timeline

dmgreen created this revision.Oct 4 2021, 3:58 AM
dmgreen requested review of this revision.Oct 4 2021, 3:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 4 2021, 3:58 AM
samtebbs added inline comments.Oct 4 2021, 6:21 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1069

valid -> validate

1071

Rather than re-creating the live out list, would it be possible to do this MQPRCopy checking when constructing the live out list initially, or is the full list of liveouts needed before doing this checking for MQPRCopy instances?

dmgreen updated this revision to Diff 377262.Oct 5 2021, 8:29 AM
dmgreen added inline comments.
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1071

Yeah, this code was a little odd. It is really meant to be a worklist, that we may need to add extra elements to for the copy values we treat as live-out.

I've changed it to a proper worklist. Let me know what you think.

samtebbs accepted this revision.Oct 5 2021, 8:35 AM

Looks good to me, with one more question.

llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1068

Does it need to clear the list if it's returning immediately after?

This revision is now accepted and ready to land.Oct 5 2021, 8:35 AM
dmgreen added inline comments.Oct 7 2021, 2:42 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1068

Yeah, It's a class variable used elsewhere. If we fail to tail predicate we might as well create vorr's, not vmovds for the code that remains.

This revision was landed with ongoing or failed builds.Oct 7 2021, 4:52 AM
This revision was automatically updated to reflect the committed changes.
samtebbs added inline comments.Oct 7 2021, 6:59 AM
llvm/lib/Target/ARM/ARMLowOverheadLoops.cpp
1068

Ah I thought it was stack-allocated.