Added some accumulate op to accumulate op,The old CPU model only had MLA->MLA forwarding. and non-accumulate opI added some more
accumulate op to accumulate op, and non-accumulate op (e.g. MUL, shift)
to relevant accumulate op (e.g. mulMLA, shift) to accumulate opSRA) forwarding according to the
Cortex A57 Software Optimization Guide.
Old model only had mla->mla forwardingThe patch improves performance in some internal benchmarks and
causes no significant regressions (none in SPEC).