Teach llvm.aarch64.neon.pmull64 to use higher half registers when
exactly one operand is already in the higher half.
- The other non-higher-half operand must be in FP & SIMD register regardless of this patch. So dup it to the right lane rather than fmov-ing higher-half operand to the lower-half.
This is at least a tie, and in most cases a win; say {pmull, pmull2} instruction execute on both higher-half and lower-half of the same source operand (e.g., llvm/test/CodeGen/AArch64/aarch64-pmull2.ll)
Are you intentionally avoiding the variant of "dup" that takes a GPR operand (DUPv2i64gpr)?