This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Prefer ZIP1 over INS to lower concat_vectors.
AbandonedPublic

Authored by efriedma on Aug 2 2021, 1:25 PM.

Details

Summary

ZIP1 should have comparable performance, and gives the register allocator more flexibility.

Diff Detail

Event Timeline

efriedma created this revision.Aug 2 2021, 1:25 PM
efriedma requested review of this revision.Aug 2 2021, 1:25 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 2 2021, 1:25 PM

ZIP1 should have comparable performance

I think on a CPU with 64bit NEON pipelines a zip will count as a 128bit instruction, a INS as a single 64bit instruction.
i.e Note 1 in 4.17 of the Cortex-A55 optimization guide: https://developer.arm.com/documentation/epm128372/2-0/

Oh, that's unfortunate... I'll just abandon this for now, then; it's not blocking anything for me.

I see a couple ways forward here:

  1. Specialize the generated code based on the target CPU.
  2. Generate zip1, but add an optimization after regalloc to transform zip1 to ins if the destination is equal to one of the source registers.
efriedma abandoned this revision.Aug 2 2021, 4:53 PM