This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Add a tablegen pattern for RADDHN/RADDHN2
ClosedPublic

Authored by labrinea on Dec 22 2021, 7:05 AM.

Details

Summary

Converts RSHRN/RSHRN2 to RADDHN/RADDHN2 when the shift amount is half the width of the vector element. The latter has twice the throughput and half the latency on Arm out-of-order cores. Setting up the zero register adds no latency.

Diff Detail

Unit TestsFailed

Event Timeline

labrinea created this revision.Dec 22 2021, 7:05 AM
labrinea requested review of this revision.Dec 22 2021, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2021, 7:05 AM
This revision is now accepted and ready to land.Dec 23 2021, 5:52 AM
This revision was landed with ongoing or failed builds.Dec 24 2021, 3:31 AM
This revision was automatically updated to reflect the committed changes.

Hmm. Are you sure this one is a great idea? The "Setting up the zero register adds no latency" won't be true on any in-order cpu, and still has some frontend cost on an out of order cpu. The codesize will be larger in any case, so this probably shouldn't be done at -Os/-Oz.

The idea with these kind of transforms is that they are OK to do so long as they make some cpu better without making anything else worse. This is intrinsic only, but it may be best to only do it for specific cpus when not under minsize. Or do it at a different point where we know the movi can be pulled out of a loop. (If we really want to do it at all and not just leave it to the programmer if they need it).