This is another step in vector narrowing - a follow-up to D53784 (and hoping to eventually squash potential regressions seen in D51553).
Most of the x86 test diffs are wins, but the AArch64 diff is probably not. Do we need the target hook to distinguish between "isExtractSubvectorCheap" and "isExtractSubvectorFree"?
This problem probably already exists independent of this patch, but it went unnoticed in the previous patch because there were no regression tests that showed that possibility.
The x86 diff in i64-mem-copy.ll is also close. Given the frequency throttling concerns with using wider vector ops, I think an extra extract to reduce vector width is a reasonable trade-off at this level of codegen, but if we're going to refine the target hook for AArch, we might adjust the x86 override too.
Side note for the ARM folks - I think this applies here?