Copy some of the DataLayout checking, for int/ptr conversions, into BasicTTI and fix the assumption that all bitcasts of the same type sizes are free. We now only assume that bitcasts between ints and ptrs of the same size are free. This allows TTImpl->getUserCost to just call the concrete implementation of getCastInstrCost.
I looked at this diff through x86 codegen, and it seems like an improvement to create the vector trunc here. But it's likely a moot point in the motivating real-world example from the bug report because instcombine reduces the IR before SLP sees this form. We may be able to add some load-combining transforms to -vector-combine to catch this.