This moves the X86 specific transform from rL364407 into DAGCombiner to generically handle 'little to big' cases (e.g. extract_subvector(v2i64 bitcast(v16i8))), this allows us to remove both the x86 implementation and the aarch64 bitcast(extract_subvector(bitcast())) combine (note none of the regressions are due to removing this).
We see a number of regressions which is why I went for making it x86 in the first place:
AArch64's poor handling of shuffle(bitcast(),bitcast()) cases - AArch64ISD::DUPLANE cases in particular as well as "extract high" patterns.
The merge-store.ll change is interesting - AArch64TargetLowering::allowsMisalignedMemoryAccesses permits 'fast' unaligned v2i64 stores in all cases which is being exposed by removing the bitcast.
Most likely we need to address these issues before this patch can proceed - I'm happy to help if someone can suggest what is the most critical to fix first.