This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Add a tail-predication loop predicate register
ClosedPublic

Authored by dmgreen on Aug 6 2021, 4:52 AM.

Details

Summary

The semantics of tail predication loops means that the value of LR as an instruction is executed determines the predicate. In other words:

mov r3, #3
DLSTP lr, r3        // Start tail predication, lr==3
VADD.s32 q0, q1, q2 // Lanes 0,1 and 2 are updated in q0.
mov lr, #1
VADD.s32 q0, q1, q2 // Only first lane is updated.

This means that the value of lr cannot be spilled and re-used in tail predication regions without potentially altering the behaviour of the program. More lanes than required could be stored, for example, and in the case of a gather those lanes might not have been setup, leading to alignment exceptions.

This patch adds a new "lr" predicate operand to MVE instructions in order to keep a reference to the lr that they use as a tail predicate. It will usually hold the zeroreg meaning not predicated, being set to the LR phi value in the MVETPAndVPTOptimisationsPass. This will prevent it from being spilled anywhere that it needs to be set.

A lot of tests needed updating.

Diff Detail

Event Timeline

dmgreen created this revision.Aug 6 2021, 4:52 AM
dmgreen requested review of this revision.Aug 6 2021, 4:52 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2021, 4:52 AM
samtebbs added inline comments.Aug 9 2021, 5:50 AM
llvm/lib/CodeGen/MachineVerifier.cpp
1656–1657

What's the purpose of this change?

From looking at the changes to some of the tests that should produce tail-predicated loops, none of them get the $noreg replaced with $lr. Isn't that supposed to happen in the VPT optimisation pass, or am I looking in the wrong place?

From looking at the changes to some of the tests that should produce tail-predicated loops, none of them get the $noreg replaced with $lr. Isn't that supposed to happen in the VPT optimisation pass, or am I looking in the wrong place?

There are a couple of mir tests that I updated to use $lr (like dont-ignore-vctp.mir and subreg-liveness.mir), but most I just used $noreg for to simplify things, in the same way that we still use t2DoLoopStart where most tailpredicated loops would usually use t2DoLoopStartTP. It is the register allocator that this is targetting, so the tests checking the ARMLowOverheadLoops pass needn't change.

The only reproducer I have of the original problem requires a very large test file and downstream scheduling to trigger the same issue.

llvm/lib/CodeGen/MachineVerifier.cpp
1656–1657

The "UnspillableTerminator" we use for t2LoopEndDec and t2DoLoopEndTP need only a single use up to phi elimination in order to be lowered correctly. That is already tested in phi elimination, with this verifier check making sure that the unspillable terminators always have a single use. This is only for virtual regs, not physical regs.

But there is a point, in between phi elimination and when the reg allocator assigns registers (which is only a very short period of time) that we can have multiple uses for a def that is still not (quite yet) a physical reg. So this check makes it slightly more precise, not running into verifier problems where they are not an issue.

samtebbs accepted this revision.Sep 1 2021, 9:14 AM

Your comments make sense 👍 LGTM

This revision is now accepted and ready to land.Sep 1 2021, 9:14 AM
This revision was landed with ongoing or failed builds.Sep 2 2021, 5:43 AM
This revision was automatically updated to reflect the committed changes.