This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Fold VCMP into VPT
ClosedPublic

Authored by dmgreen on Aug 22 2019, 2:15 AM.

Details

Summary

MVE has VPT instructions, which perform the duties of both a VCMP and a VPST in a single instruction, performing the compare and starting the VPT block in one. This teaches the MVEVPTBlockPass to fold these, searching back through the basicblock for a valid VCMP and creating the VPT from its operands.

There are some changes to the VPT instructions to accommodate this, altering the order of the operands to match the VCMP better, and changing P0 register defs to be VPR defs, as is used in other places.

Diff Detail

Event Timeline

dmgreen created this revision.Aug 22 2019, 2:15 AM

I was tired as looking at so many VPST instructions.

On an aside, P0 seems like a better choice for codegen than VPR from what I understand, more accurately specifying what is modified in the regsiters. I feel like the registers should be more "related" than they are, being subregisters of each other or something. But I've not looked into that more thoroughly.

samparker added inline comments.Aug 27 2019, 5:27 AM
llvm/lib/Target/ARM/MVEVPTBlockPass.cpp
226

Would be nice to only have to call this once.

dmgreen updated this revision to Diff 217465.Aug 27 2019, 12:05 PM
dmgreen marked an inline comment as done.

Rebased and addressed comments.

samparker accepted this revision.Aug 28 2019, 2:08 AM

Cheers, LGTM.

This revision is now accepted and ready to land.Aug 28 2019, 2:08 AM
This revision was automatically updated to reflect the committed changes.