Pre and post-index loads and stores are modelled incorrectly. The address update is modelled with too high latency. This is easily visible in llvm-mca's output and we discussed that earlier here:
https://github.com/llvm/llvm-project/issues/61047#issuecomment-1452120079
Part of the problem has to do with the way operands are defined in the scheduling model. It affects other operands as well (see the fadd in https://godbolt.org/z/d1Gbr48cE), but this seems to be a good start.
This fixes the problem for the Neoverse V2. The problem also affects the other Neoverse cores (different schedmodels influenced each other), but I won't be fixing the others as it would require checking performance. For the V2, performance wise this is also ok.
This seems to be matching both pre- and post- forms. Same with the change above. Does that matter?