This implements target hook shouldFavorPostInc for AArch64, which is queried in LoopStrengthReduce that can bring loops in a better form to generate post-increments.
Motivating examples are similar to: test/CodeGen/AArch64/arm64-scaled_iv.ll. I have changed that first from an opt to an llc test locally, and the diff here is against this local change. Using that example, more efficient is to transform this:
LBB0_1: ldr d0, [x1, x8] ldr d1, [x10, x8] fmul d0, d0, d1 str d0, [x9, x8] add x8, x8, #8 cmp w8, #152 b.ne LBB0_1
LBB0_1: ldr d0, [x1], #8 ldr d1, [x9], #8 subs w10, w10, #1 fmul d0, d0, d1 str d0, [x8], #8 b.ne LBB0_1
Which improves one benchmark with 1.2%, and didn't show any changes in another which I also didn't expect to be impacted (just checking as a sanity check).