This patch changes the cost of sqrt instrinsics for AArch64.
I have very limited knowledge of the cost model, so I tried to pick fairly conservative values as a starting point. In looking at the ongoing work on the X86 side it appears these values reflect the latency of the instruction. However, a discussion with @mssimpso led me to believe the AArch64 cost model doesn't directly use the instruction latencies.
Any input here would be greatly appreciated..
This change causes a hand full of additional cases in SPEC (e.g., povray) to be SLP vectorized. In fact, this may only change codegen when targeting Kryo where insert and extract element operations are cheaper than most other sub-targets.
Chad
I don't know the range of costs that AARCH64 cores can have here - for x86 we tend to qualify these by mentioning the core type that we used for the costs in a comment. But AARCH64 is younger so might still be more consistent!