Similar to D123386, this adds D-Movs to the AArch64 perfect shuffle tables, slightly lowering the costs a little more. This is a rough improvement in general, especially if you ignore mov v0.16b, v2.16b type moves that are often artefacts of the calling convention.
The D register movs are encoded as (0x4 | LaneIdx), and to generate a D register move we are required to bitcast into a higher type, but it is otherwise very similar to the S-lane mov's already supported.