This is similar to what I recently did for getArithmeticReductionCost.
I'm trying to account for the narrowing from 512->256->128 as we go.
I've also added a new helper method getMinMaxCost that tries to
handle the cases where we have native min/max instructions and
fall back to cmp+select when we don't.
This change has both increases and decreases on the costs. Please
point out any changes you think are needed.
I'm not very convinced by the numbers in the tables for some cases,
but I'd like to address those as a follow up so they don't get
lost in the diff.
clang-format: please reformat the code