Since all of the other G_SHUFFLE_VECTOR transforms are going there, let's do this with dup as well. This is nice, because it lets us split up the original code into matching, register bank selection, and instruction selection.
- Create G_DUP, make it equivalent to AArch64dup
- Add a post-legalizer combine which is 90% a copy-and-paste from tryOptVectorDup, except with shuffle matching closer to what SelectionDAG does in ShuffleVectorSDNode::isSplatMask.
- Teach RegBankSelect about G_DUP. Since dup selection relies on the correct register bank for FP/GPR dup selection, this is necessary.
- Kill tryOptVectorDup, since it's now entirely handled bu G_DUP.
- Add testcases for the combine, RegBankSelect, and selection. The selection test gives the same selection results as the old test.