I'm not sure if this matches the suggestion from D90554, but it's based on the same idea that I am using for the basic model implementation and what I have partly already done for x86: throughput cost is number of instructions/uops, so size/blended costs are identical except in special cases (for example, fdiv or other always-known-expensive machine instructions).
That said, I have no clue about the accuracy of the ARM subtarget variations seen in the cost tests or whether the functional diffs are good or bad.
Can you change this to 1 under size.