We have found that -- when the selected subarchitecture has a scheduling model and we are not optimizing for size -- the machine-instruction combiner uses a too-simple algorithm to compute the cost of one of the two alternatives [before and after running a combining pass on a section of code], and therefor it throws away the combination results too often.
This fix has the potential to help any ISA with the potential to combine instructions and for which at least one subarchitecture has a scheduling model.
As of now, this is only known to definitely affect AArch64 subarchitectures with a scheduling model.
Patch by Abe Skolnik and Sebastian Pop.
Regression tested on AMD64/GNU-Linux, new test case tested to fail on an unpatched compiler and pass on a patched compiler.