Interleaved SIMD store instructions can be very costly on certain targets such as Exynos. For such instructions, we can break the inefficient instructions into multiple instructions after checking on the latency of the replacement instructions.
For example, the instruction
st2 {v0.4s, v1.4s}, addr
can be replaced by
zip1 v2.4s, v0.4s, v1.4s
zip2 v3.4s, v0.4s, v1.4s
stp q2, q3, addr
I feel like the name of this pass/file should be changed, as the name no longer covers what it does.
It seems the pass now is becoming about replacing certain SIMD patterns with longer but more efficient SIMD patterns.
AArch64SIMDOptimizer might fit also with some of the file names of other optimizer passes already in the AArch64 target?