An insert subvector under aarch64 can often be done as a single lane mov operation. For example a v4i8 inserted into a v16i8 is a "s" mov, so long as the index is a multiple of 4. This teaches the cost model that, using code copied over from the X86 backend.
Some of the costs (v16i16_4_0) are still high because they get matched as a "SK_Select", not an "SK_InsertSubvector". D120879 has some codegen tests for inserting subvectors, which I will add as llvm/test/CodeGen/AArch64/insert-subvector.ll.
nit: It seems more natural to write isa<FixedVectorType>(Tp)