This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Increase cost of v2i64 multiplies
ClosedPublic

Authored by dmgreen on Apr 3 2022, 1:56 PM.

Details

Summary

The cost of a v2i64 multiply was special cased in D92208 as scalarized into 4*extract + 2*insert + 2*mul. Scalarizing to/from gpr registers are expensive though, and the cost wasn't high enough to prevent vectorizing in places where it can be detrimental for performance. This increases it so that the costs of copying to/from GPRs is increased to 2 each, with the total cost increasing to 14. So long as umull/smull are handled correctly (as in D123006) this seems to lead to better vectorization factors and better performance.

Diff Detail

Event Timeline

dmgreen created this revision.Apr 3 2022, 1:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2022, 1:56 PM
dmgreen requested review of this revision.Apr 3 2022, 1:56 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 3 2022, 1:56 PM
SjoerdMeijer accepted this revision.Apr 4 2022, 1:35 AM

Sounds reasonable.

This revision is now accepted and ready to land.Apr 4 2022, 1:35 AM

Thank for patch. I am +1.
I hope other aarch64 folks are also happy with this cost.

This revision was landed with ongoing or failed builds.Apr 4 2022, 9:42 AM
This revision was automatically updated to reflect the committed changes.