See https://reviews.llvm.org/D6678 for the history of isExtractSubvectorCheap. Essentially the same considerations apply to ARM.
This temporarily breaks the formation of vpadd/vpaddl in certain cases; AddCombineToVPADDL essentially assumes that we won't form VUZP shuffles. This is mostly orthogonal, though, so I'll fix it in a followup.
Currently, the vmov.u16 gets moved in between the vld1s, so I worry that the compiler might try to do that again with the vorr and the CHECK-NEXT could fail.