Hi,
This is a different approach to what I was trying to achieve with D111237. When testing the former patch I found that it was becoming increasingly difficult to undo the splitting of the Loads and Exts that InstCombine was doing. So I decided to instead prevent the sinking of the Exts when it didn't make sense to.
I added some AArch64 and Arm tests, since those were the ones I knew for sure this would affect, but it might also affect other targets with widening loads.
Benchmarked Spec2017Intrate and saw no signficant differences in either size or performance, it does however improve codegen in Snappy where for AArch64 a workaround was required to avoid a superfluous 'ands' (ZExt), removing that work-around shows a performance degradation that this patch recovers.
Would welcome some help benchmarking this!
clang-format: please reformat the code