ZIP1 should have comparable performance, and gives the register allocator more flexibility.
Details
Details
Diff Detail
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Unit Tests
Event Timeline
Comment Actions
ZIP1 should have comparable performance
I think on a CPU with 64bit NEON pipelines a zip will count as a 128bit instruction, a INS as a single 64bit instruction.
i.e Note 1 in 4.17 of the Cortex-A55 optimization guide: https://developer.arm.com/documentation/epm128372/2-0/
Comment Actions
Oh, that's unfortunate... I'll just abandon this for now, then; it's not blocking anything for me.
I see a couple ways forward here:
- Specialize the generated code based on the target CPU.
- Generate zip1, but add an optimization after regalloc to transform zip1 to ins if the destination is equal to one of the source registers.