If we have (extract_subvector(load wide vector)) with no other users, that can just be (load narrow vector).
I need help to confirm that all of the test diffs are correct. When I saw how many AArch tests were changing I thought something went wrong, but on closer inspection, we just delete the '2' from all of those instructions. Hooray for mnemonics that actually make sense!
The memop chain updating is based on code that already exists multiple times in x86, so I think that should be pulled into a helper function as a follow-up. I wouldn't have gotten that sequence on my own.
Background: this is a potential improvement noticed via regressions caused by making x86's peekThroughBitcasts() not loop on consecutive bitcasts (see comments in D33137).