The old CPU model only had MLA->MLA forwarding. I added some missing
MUL->MLA read advances and a missing absolute diff accumulator read
advance according to the Cortex A57 Software Optimization Guide.
The patch improves performance in EEMBC rgbyiqv2 by about 6%-7% and
spec2006/milc by 8% (repeated runs on multiple devices), causes no
significant regressions (none in SPEC).
Can you update this comment to why this is different, not why it has _changed_ (which doesn't mean a lot once the code is in-tree.)
So something like "Use a WriteRes as opposed to SchedAlias for advance lookup"